[
  {
    "path": ".github/workflows/build_html_output.yml",
    "content": "name: \"Build HTML output\"\non:\n- push\n\njobs:\n  html:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v6\n      - name: Set up Python environment\n        uses: actions/setup-python@v6\n        with: \n          python-version: \"3.10\"\n      - name: Update pip\n        run: |\n          python -m pip install --upgrade pip\n      - name: Install dependencies\n        run: |\n          if [ -f build-requirements.txt ]; then pip install -r build-requirements.txt; fi\n      - name: Build HTML\n        uses: rickstaa/sphinx-action@master\n        with:\n          docs-folder: \".\"\n      - uses: actions/upload-artifact@v6\n        with:\n          name: ShapingDocumentsHTML\n          path: _build/html/\n"
  },
  {
    "path": ".github/workflows/test_document_sources.yml",
    "content": "name: \"Test document sources\"\non:\n- push\n\njobs:\n  linkcheck:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v6\n      - name: Set up Python environment\n        uses: actions/setup-python@v6\n        with: \n          python-version: \"3.10\"\n      - name: Update pip\n        run: |\n          python -m pip install --upgrade pip\n      - name: Install dependencies\n        run: |\n          if [ -f build-requirements.txt ]; then pip install -r build-requirements.txt; fi\n      - name: Check links\n        uses: rickstaa/sphinx-action@master\n        with:\n          docs-folder: \".\"\n          build-command: \"sphinx-build -M linkcheck . _build\"\n"
  },
  {
    "path": ".gitignore",
    "content": "# Ignore general\n*~\n\n# Ignore Unicode PDFs & misc resources\nreference/\n_build/\n\n# Ignore auto-generated binary spelling dictionary\ndictionary.dic\n"
  },
  {
    "path": "BUILD.md",
    "content": "# Building a local copy of these documents #\n\nA local, static-HTML version of these documents can be built with the\n[Sphinx](https://www.sphinx-doc.org/) generator.\n\nThe Sphinx-related files in the repository are:\n```\nconfig.py\nindex.rst\nmake.bat\nMakefile\n```\n\nplus the directories\n```\n_build/\n_static/\n_templates/\n_ext/\n```\n\n## Sphinx ##\n\nSphinx is a Python-based utility that you will need to install on your\nlocal machine. The official [installation\nguide](https://www.sphinx-doc.org/en/master/usage/quickstart.html)\ncovers what is necessary for a variety of OSes and\nenvironments. Perhaps the easiest approach is to install Sphinx in a\nPython [virtual\nenvironment](https://www.sphinx-doc.org/en/master/usage/installation.html#using-virtual-environments). \n\nAfter installing Sphinx itself, you will also need to install the\n[MyST-Parser](https://myst-parser.readthedocs.io/en/latest/) package,\nwhich enables Sphinx to process Markdown files. Using the\nvirtual-environment installation method, you can keep both of these\npackages contained for this project.\n\nAt the moment there are three other dependencies involved, all of which\nare Sphinx-extension packages:\n\n1. [Sphinx-multitoc-numbering](https://sphinx-multitoc-numbering.readthedocs.io/),\n   which is required to make Sphinx use a continuous numbering scheme\n   across the files\n\n2. [sphinx External TOC](https://sphinx-external-toc.readthedocs.io/),\n   which is required to define the navigation sidebar declaratively\n   (see the [TOCtrees](#toctrees) subsection below for more info)\n\n3. [sphinx inline svg](https://pypi.org/project/sphinx-inline-svg/),\n   which is used to implement the user-togglable cluster colors on the\n   illustrations of feature application.\n\nbut a full `build-requirements.txt` file is included in the repository\nthat lists the packages in the author's virtual environment. You\nshouldn't _need_ to utilize it, since just installing Sphinx,\nMyST-Parser, and the extensions ought to suffice, but it is there if\nrequired.\n\nThe build also uses a custom Sphinx extension, named\n`shapingdocs_svg_color_toggles`, to generate the HTML elements used to\ndo the color toggling. This extension is included in the `_ext/`\ndirectory of this repository and is called from that location,\nhowever, so you do not need to install it separately.\n\n\n\n\n## Building HTML documents ##\n\nWith the Sphinx and MyST-Parser packages installed, go to the\ntop-level directory of this repository in a shell or\nterminal. Building the HTML documents should only take two steps:\n\n1. Run `make clean` to clear out all temporary files from previous\n   run. Do this every time.\n\n2. Run `make html` to regenerate the HTML files. The output files will\n   be written to the `_build/html/` subdirectory.\n\n\n## Test suite ##\n\nThe `test/` directory contains test elements. At the moment, all of\nthe tests are run manually, but as the kinks are worked out they may\nbe rolled into either Git hooks or GitHub Actions, so it is advisable\nto start using them. Currently there are two tests available as\nMakefile targets: \n\n1. Run `make linktest` to run checks on all of the URLs found in the\n   documents. \n   \n2. Run `make spellcheck` to run spellchecking on the documents.\n\nYou can also run `make test` to run both tests in sequence.\n\nThe spell-checking is configured in `test/spellcheck.yaml`. It uses\nthe [PySpelling](https://facelessuser.github.io/pyspelling/) package\nwith the custom wordlist at `test/wordlist.txt`.\n\nThere are a few lingering peculiarities to PySpelling (most notably,\nit supports excluding specified HTML elements, but cannot exclude\n`<table>` elements because of a discrepancy between its built-in\nMarkdown converter and Sphinx, so at present the `character-tables/`\ndirectory returns a great many false hits from the Unicode codepoint\nnames in the tables). The plan is to iron those out, then run both the\nspell-checking and link-checking tests automatically on all pull\nrequests. When that is implemented, it will be documented here.\n\n\n\n## Editing and bugfixing ##\n\nThe static-HTML version of the docs are a work-in-progress at the\nmoment, so please do poke around for problems and report any bugs.\n\nBasic Sphinx configuration is done in the `config.py` file.\n\nThe HTML output documents are currently using the \"Alabaster\" theme,\nwhich comes preinstalled. The Alabaster theme accepts several\nconfiguration options which are also kept in `config.py`. Output\ncustomization for the theme is also tweaked in the `custom.css` file\nin the `_static/` subdirectory (just be sure you edit the one in\n`_static/` itself; whenever the docs are rebuilt with `make`, that\nfile also gets copied into `_build/html/_static/`, so don't edit that\ncopy of the file since it gets overwritten).\n\nTo report a suspected typo or to suggest a general wording change,\nplease first synchronize your local repository with `git pull`. Then\ndo a `make clean`/`make html` as described in the section above.\n\n\n### TOCtrees ###\n\nSphinx, by default, is hardcoded in a way that requires all documents\nto be referenced in a separate, Sphinx-specific `toctree` structure,\nafter which the navigation sidebars (and other elements) are generated\non-the-fly at build-time by Sphinx itself.\n\nThe current documents are using a third-party extension that defines\nthe \"TOCtree\" in a declarative YAML file instead, to work around some\nundesirable outputs -- mainly in the GitHub repository views -- that\nSphinx triggers with its on-the-fly `toctree` process.\n\nBut this approach isn't (yet?) perfect. Some files (namely this one,\n`BUILD.md`, and the image-generation-log files) are manually excluded\nfrom the build process so that they do not generate a flurry of\nwarning messages. That's deliberate, because the build instructions\nand log files are metadata and aren't part of the final documentation\nset itself.\n"
  },
  {
    "path": "Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = .\nBUILDDIR      = _build\nPYSPELLING    = pyspelling\nPYSPELLINGMARKDOWNCONF = test/spellcheck.yml\nPYSPELLINGHTMLCONF = test/spellcheck_html.yml\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile\n\n# Run tests on links and spelling\ntest: linktest spellcheck\n\n# Use Sphinx's built-in link checker\nlinktest:\n\t@$(SPHINXBUILD) -M linkcheck \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n# Use PySpelling\nspellcheck:\n\t@$(PYSPELLING) -c \"$(PYSPELLINGMARKDOWNCONF)\"\n\n# Use PySpelling\nhtmlspellcheck:\n\t@$(PYSPELLING) -c \"$(PYSPELLINGHTMLCONF)\"\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n"
  },
  {
    "path": "README.md",
    "content": "# OpenType shaping documents #\n\nSponsored by [YesLogic](https://yeslogic.com/) \n\n_<aside>Thanks also to the developers of HarfBuzz and AllSorts, plus many other font engineers and text-encoding experts for their generosity of time and insightful contributions.</aside>_\n\n## &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#127366; &#127344; &#127361; &#127357; &#127352; &#127357; &#127350; ##\n>\n> This repository is an active WORK IN PROGRESS.\n>\n> NONE of the documents you currently see here are complete\n> nor are they suitable for reference. PLEASE do not use\n> them as a guide or as a general information source.\n>\n> As long as this warning text remains visible, the above \n> holds true. \n\nThese documents are meant to provide a functional specification for\ntext shaping. The expectation is that an implementer of this\nspecification will be using fonts in the OpenType font format applied\nto input text that complies with Unicode.\n\nAt present, we are seeking comments and bugfixes. Interested readers\nand contributors can begin at the\n\n  - Indic Model ([general information](opentype-shaping-indic-general.md))\n    - Scripts covered: [Devanagari](opentype-shaping-devanagari.md), [Bengali](opentype-shaping-bengali.md), [Gujarati](opentype-shaping-gujarati.md), [Gurmukhi](opentype-shaping-gurmukhi.md), [Kannada](opentype-shaping-kannada.md), [Malayalam](opentype-shaping-malayalam.md),\n      [Oriya](opentype-shaping-oriya.md), [Tamil](opentype-shaping-tamil.md), [Telugu](opentype-shaping-telugu.md), [Sinhala](opentype-shaping-sinhala.md)\n  - Arabic Model ([general information](opentype-shaping-arabic-general.md)\n    - Scripts covered: [Arabic](opentype-shaping-arabic.md), [N'Ko](opentype-shaping-nko.md), [Syriac](opentype-shaping-syriac.md), [Mongolian](opentype-shaping-mongolian.md)\n  - [Hangul](opentype-shaping-hangul.md)\n  - [Hebrew](opentype-shaping-hebrew.md)\n  - [Khmer](opentype-shaping-khmer.md)\n  - [Myanmar](opentype-shaping-myanmar.md)\n  - [Thai and Lao](opentype-shaping-thai-lao.md)\n  - [Tibetan](opentype-shaping-tibetan.md)\n  - [Universal Shaping Engine (<abbr>USE</abbr>)](opentype-shaping-use.md)\n    - All complex scripts that are not handled by a dedicated\n      script-specific shaping model\n  - [Default](opentype-shaping-default.md)\n    - All non-complex scripts\n  - [Emoji](opentype-shaping-emoji.md)\n    - Emoji sequences do not constitute a separate shaping model,\n      but handling emoji sequences can incorporate many of the same\n      Opentype mechanisms and should not be overlooked\n  \nshaping documents and are encouraged to submit their feedback\non the text or images of any of the linked scripts. The documents are\norganized by script; where there are multiple shaping models for a\nparticular script (including deprecated models), the various models are\nall addressed in the same script-specific document.\n\nThe documents also include a description of\n[normalization](opentype-shaping-normalization.md) in the OpenType\nshaping context, which differs from Unicode normalization in several\nrespects.\n\nVarious [notes](notes/README.md) about the document set and the details\nof its scope, limitations, and quirks are also provided.\n\nSome [errata](errata.md) about the \"upstream\" specifications and\nreference documents are noted separately. \n\nIn its final form, this repository will hold documentation describing\nthe shaping behavior used for layout of OpenType text. In particular,\nit will focus on complex scripts.\n\nIn addition to the primary, per-script documents, implementers and\nother interesteed readers are encouraged to check the\n[character tables](character-tables/README.md) for correctness and to\nexamine the [image-generation logs](/images/README.md) to identify\nissues seen in the inline images.\n\n### References\n\nThese documents cite the following informative references:\n\n1. The Microsoft [Script development\n   specifications](https://docs.microsoft.com/en-us/typography/script-development/standard),\n   which document the behaviors expected for OpenType Layout fonts and\n   provide guidance &amp; examples for type designers. OpenType is a\n   registered trademark of Microsoft Corporation. \n2. Related portions of the Microsoft OpenType specification, such as the\n   [OpenType Layout tag\n   registry](https://docs.microsoft.com/en-us/typography/opentype/spec/ttoreg)\n   and [OpenType Layout common table\n   formats](https://docs.microsoft.com/en-us/typography/opentype/spec/chapter2),\n   which list and define feature tags, script &amp; language tags, and\n   other internals of compliant OpenType font binaries. OpenType is a\n   registered trademark of Microsoft Corporation. \n3. The [HarfBuzz](https://github.com/harfbuzz/harfbuzz) project, which\n   includes a free-software/open-source implementation of OpenType\n   Layout shaping with full source code and documentation. \n4. The [AllSorts](https://github.com/yeslogic/allsorts) project, which\n   includes a free-software/open-source implementation of OpenType\n   Layout shaping with full source code and documentation.\n5. The [Unicode\n   Standard](http://www.unicode.org/standard/standard.html) and\n   related Unicode Consortium projects such as the [Unicode Character\n   Database](http://www.unicode.org/reports/tr44/), which defines\n   Unicode code points and formal character properties used in\n   shaping. Unicode and the Unicode Logo are registered trademarks of\n   Unicode, Inc. in the United States and other countries.\n6. The YesLogic [text corpus](https://github.com/yeslogic/corpus),\n   which includes real-world text data for several Indic scripts,\n   scraped from Wikipedia, Reddit, and multiple online news\n   sources. This data is used to test shaping in AllSorts and Prince.\n7. Known but unofficial information about other shaping-engine\n   projects. Primarily this includes tests and reproducible issues\n   found via [HarfBuzz](https://github.com/harfbuzz/harfbuzz), because\n   HarfBuzz intentionally aims to produce results that will 100% match\n   the output of Microsoft Uniscribe (not counting cases where\n   Uniscribe's output is known to be incorrect, of course).\n   > Note: occasionally, tests or issues documenting the behavior of\n   > Apple CoreText are also included, but CoreText compatibility is\n   > not an explicit goal for HarfBuzz.\n   \n"
  },
  {
    "path": "_ext/LICENSE.md",
    "content": "# License for shapingdocs Sphinx extension software\n\nUnless otherwise indicated, all code in this directory is licensed\nunder the two-clause BSD license below. This license does _not_ apply\nto code or other files found in parent, sibling, and other directories\nwithin this repository.\n\nCopyright 2025 Nathan Willis.\n\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\n1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\n\n2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n"
  },
  {
    "path": "_ext/README.md",
    "content": "# Sphinx extensions and related build tools\n\nThis directory holds custom extensions used by the Sphinx builder for\nthis set of documents, and a few somewhat-related Python utilities.\n\nThe contents of this directory are licensed under the 2-Clause BSD\nlicense. This is the license used by the Sphinx project, and was\nchosen here in order to maximize compatibility.\n\nSee the accompanying [LICENSE](LICENSE.md) file.\n\nNote that the license does **not** apply to this documentation project\nas a whole, but only to the contents of this directory.\n"
  },
  {
    "path": "_ext/abbreviations.py",
    "content": "# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis\n#\n# SPDX-License-Identifier: BSD-2-Clause\n\"\"\"Dictionary of the acronyms or abbreviations and corresponding full-text expansions used in the documents.\n\"\"\"\n\n# Dictionary mapping the set of acronyms and abbreviations\n# that get wrapped in <abbr> tags in the generated output\n# to the corresponding full-expansion text for each.\n#\n# Perhaps obviously, the keys were used to identify and\n# tag acronyms with <abbr> in the document source. In some,\n# but not all, cases the full expansions are also used.\nABBR_STRING_MAP = {\n    \"AAT\": \"Apple Advanced Typography\",\n    \"AJT\": \"Arabic Joining Type\",\n    \"AMTRA\": \"Arabic Mark Transient Reordering Algorithm\",\n    \"Ccc\": \"Canonical Combining Class\",\n    \"CEK\": \"Combining Enclosing Keycap\",\n    \"CGJ\": \"Combining Grapheme Joiner\",\n    \"CLDR\": \"Common Locale Data Repository\",\n    \"CSS\": \"Cascading Style Sheets\",\n    \"GDEF\": \"Glyph Definition table\",\n    \"GPOS\": \"Glyph Positioning table\",\n    \"GSUB\": \"Glyph Substitution table\",\n    \"LRM\": \"Left-to-Right Mark\",\n    \"LTR\": \"Left-To-Right\",\n    \"MCM\": \"Modifier Combining Mark\",\n    \"NBSP\": \"No-Break Space\",\n    \"NFC\": \"Normalization Form C\",\n    \"NFD\": \"Normalization Form D\",\n    \"NFKC\": \"Normalization Form KC\",\n    \"NFKD\": \"Normalization Form KD\",\n    \"PNG\": \"Portable Network Graphics\",\n    \"PUA\": \"Private Use Area\",\n    \"RGI\": \"Recommended for General Interchange\",\n    \"RLM\": \"Right-to-Left Mark\",\n    \"RTL\": \"Right-To-Left\",\n    \"SHA\": \"Secure Hash Algorithm\",\n    \"SVG\": \"Scalable Vector Graphics\",\n    \"UCD\": \"Unicode Character Database\",\n    \"UCDM\": \"Unicode Character Decomposition Mapping\",\n    \"UGC\": \"Unicode General Category\",\n    \"UIPC\": \"Unicode Indic Positional Category\",\n    \"UISC\": \"Unicode Indic Syllabic Category\",\n    \"UJT\": \"Unicode Joining Type\",\n    \"URL\": \"Uniform Resource Locator\",\n    \"USE\": \"Universal Shaping Engine\",\n    \"ZWJ\": \"Zero-Width Joiner\",\n    \"ZWNJ\": \"Zero-Width Non Joiner\",\n}\n"
  },
  {
    "path": "_ext/colors.py",
    "content": "# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis\n#\n# SPDX-License-Identifier: BSD-2-Clause\n\"\"\"The sequence of #RRGGBB colors used in the colorized SVG illustration images.\n\"\"\"\n\n# Defines the color sequence used to colorize clusters in the SVG illustration\n# images.\n#\n# It is based on the G10 sequence employed by Plotly, as visible at:\n# https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express\n#\n# This sequence is chosen because it is generally consistent in value and it\n# does not include any greys.\nCOLOR_LIST = [ \"#3366cc\", \"#dc3912\", \"#ff9900\", \"#109618\", \"#990099\", \"#0099c6\", \"#dd4477\", \"#66aa00\", \"#b82e2e\", \"#316395\",]\n"
  },
  {
    "path": "_ext/shapingdocs_svg_color_toggles.py",
    "content": "# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis\n#\n# SPDX-License-Identifier: BSD-2-Clause\n\n\"\"\"Sphinx extension to attach a color-toggle button to specified SVG elements in documents.\n\n   This extension only affects the `html` builder.\n\"\"\"\n\nfrom __future__ import annotations\n\nfrom docutils import nodes\n\nfrom docutils.parsers.rst import directives\n\nfrom sphinx.util import logging\n\nfrom sphinx.application import Sphinx\nfrom sphinx.util.docutils import SphinxDirective, SphinxTranslator, SphinxRole\n#from sphinx.util.typing import ExtensionMetadata\n\nfrom sphinx.writers.html import HTMLTranslator\n\n\n# Directive to insert color-toggle button\n#\n#   Example output:\n#     <p>\n#        <button class=\"svg-color-toggle-button\" id=\"button-bengali-akhn-kssa\" onclick=\"toggleColor('bengali-akhn-kssa')\">\n#      Toggle cluster colors</button>\n#     </p>\n#\n\n\nclass svg_color_toggle_node(nodes.General, nodes.Element):\n    \"\"\"SVG color-toggle node.\"\"\"\n\n    pass\n\n\nclass SVGColorToggleButton(SphinxDirective):\n    \"\"\"A directive to insert a color-toggle switch for \n       an SVG element with the specified id.\"\"\"\n\n    has_content = True\n    required_arguments = 1\n    optional_arguments = 0\n    \n    def run(self) -> list[svg_color_toggle_node]:\n\n        # The sole argument is the CSS element id to build\n        # the button for\n        svg_element = self.arguments[0]\n        \n        return [\n            svg_color_toggle_node(\n                target_id = svg_element,\n                button_id = \"button-\" + svg_element,\n                button_klass = \"svg-color-toggle-button\",\n                button_label = \"Toggle cluster colors\",\n                )\n            ]\n\n\ndef visit_svg_color_toggle_node_html(translator: HTMLTranslator, node: svg_color_toggle_node) -> None:\n    \"\"\"Entry point of the SVG color-toggle node.\"\"\"\n    html: str = \"\"\n\n    if node[\"target_id\"]:\n        html += '<button class=\"' + node[\"button_klass\"] + '\" id=\"' + node[\"button_id\"] + '\" onclick=\\'toggleColor(\\\"' + node[\"target_id\"] + '\\\")\\'>' + node[\"button_label\"]\n    else:\n        pass\n\n    translator.body.append(html)\n\n\ndef depart_svg_color_toggle_node_html(translator: HTMLTranslator, node: svg_color_toggle_node) -> None:\n    \"\"\"Exit from the SVG color-toggle node.\"\"\"\n\n    html: str = \"\"\n\n    if node[\"target_id\"]:\n        html += '</button>'\n\n    translator.body.append(html)\n\n\ndef visit_svg_color_toggle_node_unsupported(translator: SphinxTranslator, node: svg_color_toggle_node) -> None:\n    \"\"\"Entry point of the ignored SVG color-toggle node.\"\"\"\n    logger.warning(\n        f\"SVG color-toggle {node['target_id']}: unsupported output format (node skipped)\"\n    )\n    raise nodes.SkipNode\n\n\n\n\ndef setup(app: Sphinx): # The ExtensionMetadata stuff is not available in this version of Sphinx. Sphinx's own docs are pretty terrible about clarifying these matters....\n#def setup(app: Sphinx) -> ExtensionMetadata:\n\n    app.add_node(\n        svg_color_toggle_node,\n        html=(visit_svg_color_toggle_node_html, depart_svg_color_toggle_node_html),\n        epub=(visit_svg_color_toggle_node_unsupported, None),\n        latex=(visit_svg_color_toggle_node_unsupported, None),\n        man=(visit_svg_color_toggle_node_unsupported, None),\n        texinfo=(visit_svg_color_toggle_node_unsupported, None),\n        text=(visit_svg_color_toggle_node_unsupported, None),\n    )\n    app.add_directive(\"svg-color-toggle-button\", SVGColorToggleButton)\n    \n    return {\n        'version': '0.1',\n        'parallel_read_safe': True,\n        'parallel_write_safe': True,\n        }\n"
  },
  {
    "path": "_global.md",
    "content": "```{role} togglebutton(raw)\n:format: html\n```\n\n```{raw} html\n\n<link rel=\"preload\" href=\"/images/color-filters.svg\" as=\"image\"/>\n```\n"
  },
  {
    "path": "_static/LICENSES_FOR_INCORPORATED_SOFTWARE.txt",
    "content": "This documentation set includes files originating from the following\nupstream projects, which are each subject to their own individual\nlicenses and are copyrighted by their own respective authors.\n\nThese respective copyright statements and licenses are reproduced\nbelow or, for those cases where the license is bundled as a separate\nfile, referenced by file name.\n\n\nSphinx\n//     https://www.sphinx-doc.org/\n//     \n//     :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.\n//     :license: BSD, see LICENSE for details.\n//\n//     https://github.com/sphinx-doc/sphinx/blob/master/LICENSE\n//\n//      - License for Sphinx\n//        ==================\n//        \n//        Unless otherwise indicated, all code in the Sphinx project is licenced under the\n//        two clause BSD licence below.\n//        \n//        Copyright (c) 2007-2023 by the Sphinx team (see AUTHORS file).\n//        All rights reserved.\n//        \n//        Redistribution and use in source and binary forms, with or without\n//        modification, are permitted provided that the following conditions are\n//        met:\n//        \n//        * Redistributions of source code must retain the above copyright\n//          notice, this list of conditions and the following disclaimer.\n//        \n//        * Redistributions in binary form must reproduce the above copyright\n//          notice, this list of conditions and the following disclaimer in the\n//          documentation and/or other materials provided with the distribution.\n//        \n//        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n//        \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n//        LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n//        A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n//        HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n//        SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n//        LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n//        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n//        THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n//        (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n//        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n//        \n//        \n//        Licenses for incorporated software\n//        ==================================\n//        \n//        The included implementation of NumpyDocstring._parse_numpydoc_see_also_section\n//        was derived from code under the following license:\n//        \n//        -------------------------------------------------------------------------------\n//        \n//        Copyright (C) 2008 Stefan van der Walt <stefan@mentat.za.net>, Pauli Virtanen <pav@iki.fi>\n//        \n//        Redistribution and use in source and binary forms, with or without\n//        modification, are permitted provided that the following conditions are\n//        met:\n//        \n//         1. Redistributions of source code must retain the above copyright\n//            notice, this list of conditions and the following disclaimer.\n//         2. Redistributions in binary form must reproduce the above copyright\n//            notice, this list of conditions and the following disclaimer in\n//            the documentation and/or other materials provided with the\n//            distribution.\n//        \n//        THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR\n//        IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\n//        WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n//        DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,\n//        INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES\n//        (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n//        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)\n//        HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,\n//        STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING\n//        IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\n//        POSSIBILITY OF SUCH DAMAGE.\n//        \n//        -------------------------------------------------------------------------------\n\n./_build/html/_static/searchtools.js\n./_build/html/_static/language_data.js\n./_build/html/_static/sphinx_highlight.js\n./_build/html/_static/basic.css:\n./_build/html/_static/doctools.js\n./_build/html/_static/documentation_options.js\n./_build/html/_static/_sphinx_javascript_frameworks_compat.js\n./_build/html/objects.inv\n./_build/html/searchindex.js\n\n\n\nSphinx 'Alabaster\" theme\n//     https://github.com/sphinx-doc/alabaster\n//     Copyright (c) 2020 Jeff Forcier.\n//     \n//     https://github.com/sphinx-doc/alabaster/blob/0.x/LICENSE\n//     \n//      - Based on original work copyright (c) 2011 Kenneth Reitz and\n//        copyright (c) 2010 Armin Ronacher.\n//        \n//        Some rights reserved.\n//        \n//        Redistribution and use in source and binary forms of the theme, with or\n//        without modification, are permitted provided that the following conditions\n//        are met:\n//        \n//        * Redistributions of source code must retain the above copyright\n//          notice, this list of conditions and the following disclaimer.\n//        \n//        * Redistributions in binary form must reproduce the above\n//          copyright notice, this list of conditions and the following\n//          disclaimer in the documentation and/or other materials provided\n//          with the distribution.\n//        \n//        * The names of the contributors may not be used to endorse or\n//          promote products derived from this software without specific\n//          prior written permission.\n//        \n//        THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n//        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n//        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n//        ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE\n//        LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR\n//        CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF\n//        SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS\n//        INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN\n//        CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)\n//        ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE\n//        POSSIBILITY OF SUCH DAMAGE.\n\n./_build/html/_static/alabaster.css\n\n\n\njQuery JavaScript Library v3.6.0\n//     https://jquery.com/\n//     \n//     Copyright OpenJS Foundation and other contributors\n//     Released under the MIT license\n//     https://jquery.org/license\n//     \n//     Date: 2021-03-02T17:08Z\n//\n//     https://github.com/jquery/jquery/blob/3.6-stable/LICENSE.txt\n//\n//      - Copyright OpenJS Foundation and other contributors, https://openjsf.org/\n//        \n//        Permission is hereby granted, free of charge, to any person obtaining\n//        a copy of this software and associated documentation files (the\n//        \"Software\"), to deal in the Software without restriction, including\n//        without limitation the rights to use, copy, modify, merge, publish,\n//        distribute, sublicense, and/or sell copies of the Software, and to\n//        permit persons to whom the Software is furnished to do so, subject to\n//        the following conditions:\n//        \n//        The above copyright notice and this permission notice shall be\n//        included in all copies or substantial portions of the Software.\n//        \n//        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n//        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n//        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n//        NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE\n//        LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\n//        OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION\n//        WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n//\n//\n//     Includes Sizzle.js\n//     https://sizzlejs.com/\n//     Sizzle CSS Selector Engine v2.3.6\n//     https://sizzlejs.com/\n//     \n//     Copyright JS Foundation and other contributors\n//     Released under the MIT license\n//     https://js.foundation/\n//     \n//     Date: 2021-02-16\n//\n//     https://github.com/jquery/sizzle/blob/main/LICENSE.txt\n//\n//      - Copyright JS Foundation and other contributors, https://js.foundation/\n//        \n//        This software consists of voluntary contributions made by many\n//        individuals. For exact contribution history, see the revision history\n//        available at https://github.com/jquery/sizzle\n//        \n//        The following license applies to all parts of this software except as\n//        documented below:\n//        \n//        ====\n//        \n//        Permission is hereby granted, free of charge, to any person obtaining\n//        a copy of this software and associated documentation files (the\n//        \"Software\"), to deal in the Software without restriction, including\n//        without limitation the rights to use, copy, modify, merge, publish,\n//        distribute, sublicense, and/or sell copies of the Software, and to\n//        permit persons to whom the Software is furnished to do so, subject to\n//        the following conditions:\n//        \n//        The above copyright notice and this permission notice shall be\n//        included in all copies or substantial portions of the Software.\n//        \n//        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n//        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n//        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n//        NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE\n//        LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\n//        OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION\n//        WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n//        \n//        ====\n//        \n//        All files located in the node_modules and external directories are\n//        externally maintained libraries used by this software which have their\n//        own licenses; we recommend you read them, as their terms may differ from\n//        the terms above.\n\n./_build/html/_static/jquery-3.6.0.js: MIT License\n./_build/html/_static/jquery.js\n\n\n\nPygments\n//     https://pygments.org/\n//\n//     Copyright (c) 2006-2022 by the respective authors (see AUTHORS file).\n//      - https://github.com/pygments/pygments/blob/master/AUTHORS\n//     \n//     https://github.com/pygments/pygments/blob/master/LICENSE\n//\n//      - Copyright (c) 2006-2022 by the respective authors (see AUTHORS file).\n//        All rights reserved.\n//        \n//        Redistribution and use in source and binary forms, with or without\n//        modification, are permitted provided that the following conditions are\n//        met:\n//        \n//        * Redistributions of source code must retain the above copyright\n//          notice, this list of conditions and the following disclaimer.\n//        \n//        * Redistributions in binary form must reproduce the above copyright\n//          notice, this list of conditions and the following disclaimer in the\n//          documentation and/or other materials provided with the distribution.\n//        \n//        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n//        \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n//        LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n//        A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n//        OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n//        SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n//        LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n//        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n//        THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n//        (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n//        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n./_build/html/_static/pygments.css\n\n\n\nUnderscore.js 1.13.1\n//     https://underscorejs.org\n//\n//     (c) 2009-2021 Jeremy Ashkenas, Julian Gonggrijp, and\n//     DocumentCloud and Investigative Reporters & Editors \n//     Underscore may be freely distributed under the MIT license.\n//\n//     https://github.com/jashkenas/underscore/blob/master/LICENSE\n//\n//      - Copyright (c) 2009-2022 Jeremy Ashkenas, Julian Gonggrijp,\n//        and DocumentCloud and Investigative Reporters & Editors \n//        \n//        Permission is hereby granted, free of charge, to any person\n//        obtaining a copy of this software and associated documentation\n//        files (the \"Software\"), to deal in the Software without\n//        restriction, including without limitation the rights to use,\n//        copy, modify, merge, publish, distribute, sublicense, and/or sell\n//        copies of the Software, and to permit persons to whom the\n//        Software is furnished to do so, subject to the following\n//        conditions:\n//        \n//        The above copyright notice and this permission notice shall be\n//        included in all copies or substantial portions of the Software.\n//        \n//        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n//        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n//        OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n//        NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n//        HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n//        WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n//        FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n//        OTHER DEALINGS IN THE SOFTWARE.\n\n./_build/html/_static/underscore-1.13.1.js MIT License\n./_build/html/_static/underscore.js MIT License\n\n\n\nSource Code Pro\n//     https://github.com/adobe-fonts/source-code-pro\n//\n//     Copyright 2010, 2012 Adobe Systems Incorporated\n//     (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights\n//     Reserved. Source is a trademark of Adobe Systems Incorporated in the\n//     United States and/or other countries. \n//     \n//     This Font Software is licensed under the SIL Open Font License, Version 1.1.\n\n./_build/html/_static/fonts/Source_Code_Pro/OFL.txt: SIL Open Font License 1.1\n./_build/html/_static/fonts/Source_Code_Pro/README.txt\n./_build/html/_static/fonts/Source_Code_Pro/SourceCodePro-Italic-VariableFont_wght.ttf\n./_build/html/_static/fonts/Source_Code_Pro/SourceCodePro-VariableFont_wght.ttf\n\n\nSource Sans 3\n//     https://github.com/adobe-fonts/source-sans\n//\n//     Copyright 2010-2020 Adobe (http://www.adobe.com/), with\n//     Reserved Font Name 'Source'. All Rights Reserved. Source is a\n//     trademark of Adobe in the United States and/or other countries. \n//     \n//     This Font Software is licensed under the SIL Open Font License, Version 1.1.\n\n./_build/html/_static/fonts/Source_Sans_3/OFL.txt: SIL Open Font License 1.1\n./_build/html/_static/fonts/Source_Sans_3/README.txt\n./_build/html/_static/fonts/Source_Sans_3/SourceSans3-Italic-VariableFont_wght.ttf\n./_build/html/_static/fonts/Source_Sans_3/SourceSans3-VariableFont_wght.ttf\n\n\n\nSource Serif 4\n//     https://github.com/adobe-fonts/source-serif\n//\n//     https://github.com/adobe-fonts/source-serif/blob/release/LICENSE.md\n//\n//      - Copyright 2014 - 2023 Adobe (http://www.adobe.com/), with\n//        Reserved Font Name ‘Source’. All Rights Reserved. Source is a\n//        trademark of Adobe in the United States and/or other countries. \n//        \n//        This Font Software is licensed under the SIL Open Font License, Version 1.1.\n\n./_build/html/_static/fonts/Source_Serif_4/OFL.txt:  SIL Open Font License 1.1\n./_build/html/_static/fonts/Source_Serif_4/README.txt\n./_build/html/_static/fonts/Source_Serif_4/SourceSerif4-Italic-VariableFont_opsz,wght.ttf\n./_build/html/_static/fonts/Source_Serif_4/SourceSerif4-VariableFont_opsz,wght.ttf\n"
  },
  {
    "path": "_static/custom.css",
    "content": "\n/* basic.css | file:///home/nate/code/opentype-shaping-documents/_build/html/_static/basic.css */\n\ndiv.body {\n  /* min-width: 360px; */\n  /* max-width: 800px; */\n  min-width: 460px;\n  max-width: 800px;\n}\n\n/* alabaster.css | file:///home/nate/code/opentype-shaping-documents/_build/html/_static/alabaster.css */\n\n/*div.document {\n  /* width: 940px; */\n/*  width: 1040px;\n} */\n\n/* div.sphinxsidebar {\n  /* width: 220px; */\n/*  width: 240px;\n} */\n\ndiv.bodywrapper {\n  /* margin: 0 0 0 220px; */\n  margin: 0 0 0 300px;\n}\n\n\n/* Hanging section numbers, for slightly easier in-page navigation. */\n\n/* Indent the body text by a fixed amount on the left, \n   then move section-numbers leftward by the same amount. */\ndiv.body>section {\n    margin-left: 4rem;\n}\n\n\nspan.section-number {\n    display: inline-block;\n    width: 3.5rem;\n    text-align: right;\n    margin-left: -4.5rem;\n    margin-right: .5rem;\n}\n/*\nspan.section-number::after {\n    content: \" \";\n    white-space: pre;\n}*/\n\n/* Locally served fonts, to support more smartfont features */\n@font-face {\n    font-family: 'Source Serif 4';\n    src: url('./fonts/Source_Serif_4/SourceSerif4-Italic-VariableFont_opsz,wght.ttf') format('truetype-variations');\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Serif 4';\n    src: url('./fonts/Source_Serif_4/SourceSerif4-VariableFont_opsz,wght.ttf') format('truetype-variations');\n    font-style: normal;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Sans 3';\n    src: url('./fonts/Source_Sans_3/SourceSans3-Italic-VariableFont_wght.ttf') format('truetype-variations');\n    size-adjust: 94%; /* 47% of x-height  originally.... */\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Sans 3';\n    src: url('./fonts/Source_Sans_3/SourceSans3-VariableFont_wght.ttf') format('truetype-variations');\n    size-adjust: 94%; /* 47% of x-height  originally.... */\n    font-style: normal;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Code Pro';\n    src: url('./fonts/Source_Code_Pro/SourceCodePro-Italic-VariableFont_wght.ttf') format('truetype-variations');\n    size-adjust: 92%; /* 46% of x-height  originally.... */\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Code Pro';\n    src: url('./fonts/Source_Code_Pro/SourceCodePro-VariableFont_wght.ttf') format('truetype-variations');\n    size-adjust: 92%; /* 46% of x-height  originally.... */\n    font-style: normal;\n    font-weight: 1 999;\n}\n\n/* Use oldstyle numerals in body text */\nbody {\n    font-feature-settings: \"onum\";\n}\n\n\n/* Tables                 */\n/* alternating background colors like GitHub inline styling uses */\ntr:nth-child(even) {background: #F0F0F4}\ntr:nth-child(odd) {background: #FFF}\n/* Less obtrusive headers */\nth {\n    font-family: 'Source Sans 3';\n    font-size-adjust: 0.48;\n    font-weight: 600;\n    background: #F0F0F4;\n    font-feature-settings: \"tnum\", \"lnum\";\n}\n\ntr {\n    font-feature-settings: \"tnum\", \"lnum\";\n}\n\n/* Try to target the captions in the toctree sidebar */\np.caption[role=\"heading\"] span.caption-text {\n    font-size: 120%\n}\n\n/* Try to target the list beneath the captions in the toctree sidebar */ \np.caption[role=\"heading\"] + ul {\n  text-indent: 0.6em;\n}\n\n/* Make <abbr> acronyms small-caps, but not interactive tooltips */\n/* The font-size adjustment here may be temproary; it depends on */\n/* the smallcap height of the font eventually specified.         */\nabbr {\n  font-variant: all-small-caps;\n  font-size: larger;\n  font-weight: 375;\n  cursor: default;\n  border-bottom: none;\n}\n\n/* Style 'samp' elements used to indicate explicit sequences and */\n/* input/output character references that must be exact.         */\nsamp {\n    font-family: 'Source Sans 3';\n    font-size-adjust: 0.48;\n    font-weight: 500; /* slightly heavier only */\n    color: #558;\n}\n\n/* De-emphasize Sphinx's section numbering, so as to be less */\n/* distracting.                                              */\n/*                                                           */\n/* TODO: shift numbers into margin.                          */\n.section-number {\n    font-size: 70%;\n    color: #888;\n}\n\n/* Style rules for using Source family fonts */\ndiv.body h1,\ndiv.body h2,\ndiv.body h3,\ndiv.body h4,\ndiv.body h5,\ndiv.body h6 {\n    font-weight: 500;\n}\n\n/* Make the site-title in the sidebar bolder and different since */\n/* there is no project logo.                                     */\nh1.logo {\n    font-family: 'Source Sans 3';\n    font-weight: 800;\n    font-size: 32px;\n    border-bottom: none;\n    text-decoration: none;\n}\n\n/* Fix the Alabaster theme's default font-scaling since we are */\n/* using a complete superfamily with consistent sizing.        */\npre, tt, code {\n    font-size: 1.0em; /* Undo Alabaster setting, best balance for the 3 element types considering Alabaster's other style rules */\n    font-weight: 450;\n    font-size-adjust: 0.46;\n    font-feature-settings: \"tnum\", \"lnum\";\n}\n\n/* Define table captions */\ncaption {\n    caption-side: bottom;\n    padding-top: 6px;\n    padding-bottom: 6px;\n    color: #656565;\n}\n\n/* Set figcaptions to look like table captions */\nfigcaption {\n    padding-top: 6px;\n    /* padding-bottom: 6px;  altering this to account for toggle-button placement*/\n    padding-bottom: 0px;\n    margin-bottom: -12px;\n    color: #656565;\n}\n\nfigcaption p {\n    margin-top: 8.5px;\n    margin-bottom: 8.5px;\n}\n\n/* Slightly lighten bgcolor on pre/tt/code in the captions, since the fg text color is lighter */\n\ncaption pre,\ncaption span.pre,\ncaption span.tt,\ncaption span.code,\nfigcaption pre,\nfigcaption span.pre,\nfigcaption span.tt,\nfigcaption span.code {\n    background-color: #eff4f7; /* testing; needs to be lighter because text is grey */\n}\n\n/* Set maximum size of SVG illustrations   */\n/*                                         */\n/* Width is currently limited to 100% of   */\n/* the parent element; height is currently */\n/* specified relative to the text size,    */\n/* for presumptive convenience. 18em is a  */\n/* trial-and-error value that seems to be  */\n/* reasonable, but is by no means science. */\nsvg.shaping-demo {\n    max-width: 100%;\n    max-height: 18em;\n}\n\n\n/* Toggleable SVG clusters */\n/*                    \n/* Greyscale               */\n/* dc: dotted-circle       */\n.shaping-demo.greyscale-svg .dc {\n    fill: #999999;\n    stroke-width: 0%;\n}\n/* arrow: right-arrow      */\n.shaping-demo.greyscale-svg .arrow {\n    fill: #666666;\n    stroke-width: 0%;\n}\n/* z: ZWJ and ZWNJ         */\n.shaping-demo.greyscale-svg .z {\n    fill: #999999;\n    stroke: #999999;\n    stroke-width: 1px;\n}\n/* c0: cluster 0           */\n.shaping-demo.greyscale-svg .c0 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 1           */\n.shaping-demo.greyscale-svg .c1 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 2           */\n.shaping-demo.greyscale-svg .c2 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 3           */\n.shaping-demo.greyscale-svg .c3 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 4           */\n.shaping-demo.greyscale-svg .c4 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 5           */\n.shaping-demo.greyscale-svg .c5 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 6           */\n.shaping-demo.greyscale-svg .c6 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 7           */\n.shaping-demo.greyscale-svg .c7 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 8           */\n.shaping-demo.greyscale-svg .c8 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n/* c0: cluster 9           */\n.shaping-demo.greyscale-svg .c9 {\n    fill: #000000;\n    stroke-width: 0%;\n}\n\n/* Colorized               */\n/* dc: dotted-circle       */\n.shaping-demo.color-svg .dc {\n    fill: #999999;\n    stroke-width: 0%;\n}\n/* arrow: right-arrow      */\n.shaping-demo.color-svg .arrow {\n    fill: #666666;\n    stroke-width: 0%;\n}\n/* z: ZWJ and ZWNJ         */\n.shaping-demo.color-svg .z {\n    fill: #999999;\n    stroke: #999999;\n    stroke-width: 1px;\n}\n/* c0: cluster 0           */\n.shaping-demo.color-svg .c0 {\n    fill: #3366cc;\n    stroke-width: 0%;\n}\n/* c0: cluster 1           */\n.shaping-demo.color-svg .c1 {\n    fill: #dc3912;\n    stroke-width: 0%;\n}\n/* c0: cluster 2           */\n.shaping-demo.color-svg .c2 {\n    fill: #ff9900;\n    stroke-width: 0%;\n}\n/* c0: cluster 3           */\n.shaping-demo.color-svg .c3 {\n    fill: #109618;\n    stroke-width: 0%;\n}\n/* c0: cluster 4           */\n.shaping-demo.color-svg .c4 {\n    fill: #990099;\n    stroke-width: 0%;\n}\n/* c0: cluster 5           */\n.shaping-demo.color-svg .c5 {\n    fill: #0099c6;\n    stroke-width: 0%;\n}\n/* c0: cluster 6           */\n.shaping-demo.color-svg .c6 {\n    fill: #dd4477;\n    stroke-width: 0%;\n}\n/* c0: cluster 7           */\n.shaping-demo.color-svg .c7 {\n    fill: #66aa00;\n    stroke-width: 0%;\n}\n/* c0: cluster 8           */\n.shaping-demo.color-svg .c8 {\n    fill: #b82e2e;\n    stroke-width: 0%;\n}\n/* c0: cluster 9           */\n.shaping-demo.color-svg .c9 {\n    fill: #316395;\n    stroke-width: 0%;\n}\n\nbutton.svg-color-toggle-button {\n    display: block;\n    margin-left: auto;\n    margin-right: auto;\n    /* margin-top: -8.5px; This makes alignment overly complicated.... */\n    padding: 4px 16px 5px 16px;\n    font-size: small;\n    color: #999;\n    background-color: #fff0;\n    border: 1px solid;\n    border-color: #bbb;\n    border-radius: 3px;\n}\n\nblockquote {\n    margin-top: 17px;\n}\n\n/* Static navigation sidebar */\n/* Turn off bullet point on heading items */\nli.static-nav-heading {\n    list-style: \"\";\n}\n\n/* L1 */\nli.toctree-l1.static-nav {\n    font-size: 120%;\n}\n/* L2 headings */\nli.toctree-l2.static-nav {\n    font-size: 100%;\n    font-style: italic;\n}\n/* L2 page links */\nli.toctree-l2.static-nav a {\n    font-size: 100%;\n    font-style: normal;\n}\n"
  },
  {
    "path": "_static/fonts/Source_Code_Pro/OFL.txt",
    "content": "Copyright 2010, 2012 Adobe Systems Incorporated (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights Reserved. Source is a trademark of Adobe Systems Incorporated in the United States and/or other countries.\r\n\r\nThis Font Software is licensed under the SIL Open Font License, Version 1.1.\r\nThis license is copied below, and is also available with a FAQ at:\r\nhttp://scripts.sil.org/OFL\r\n\r\n\r\n-----------------------------------------------------------\r\nSIL OPEN FONT LICENSE Version 1.1 - 26 February 2007\r\n-----------------------------------------------------------\r\n\r\nPREAMBLE\r\nThe goals of the Open Font License (OFL) are to stimulate worldwide\r\ndevelopment of collaborative font projects, to support the font creation\r\nefforts of academic and linguistic communities, and to provide a free and\r\nopen framework in which fonts may be shared and improved in partnership\r\nwith others.\r\n\r\nThe OFL allows the licensed fonts to be used, studied, modified and\r\nredistributed freely as long as they are not sold by themselves. The\r\nfonts, including any derivative works, can be bundled, embedded, \r\nredistributed and/or sold with any software provided that any reserved\r\nnames are not used by derivative works. The fonts and derivatives,\r\nhowever, cannot be released under any other type of license. The\r\nrequirement for fonts to remain under this license does not apply\r\nto any document created using the fonts or their derivatives.\r\n\r\nDEFINITIONS\r\n\"Font Software\" refers to the set of files released by the Copyright\r\nHolder(s) under this license and clearly marked as such. This may\r\ninclude source files, build scripts and documentation.\r\n\r\n\"Reserved Font Name\" refers to any names specified as such after the\r\ncopyright statement(s).\r\n\r\n\"Original Version\" refers to the collection of Font Software components as\r\ndistributed by the Copyright Holder(s).\r\n\r\n\"Modified Version\" refers to any derivative made by adding to, deleting,\r\nor substituting -- in part or in whole -- any of the components of the\r\nOriginal Version, by changing formats or by porting the Font Software to a\r\nnew environment.\r\n\r\n\"Author\" refers to any designer, engineer, programmer, technical\r\nwriter or other person who contributed to the Font Software.\r\n\r\nPERMISSION & CONDITIONS\r\nPermission is hereby granted, free of charge, to any person obtaining\r\na copy of the Font Software, to use, study, copy, merge, embed, modify,\r\nredistribute, and sell modified and unmodified copies of the Font\r\nSoftware, subject to the following conditions:\r\n\r\n1) Neither the Font Software nor any of its individual components,\r\nin Original or Modified Versions, may be sold by itself.\r\n\r\n2) Original or Modified Versions of the Font Software may be bundled,\r\nredistributed and/or sold with any software, provided that each copy\r\ncontains the above copyright notice and this license. These can be\r\nincluded either as stand-alone text files, human-readable headers or\r\nin the appropriate machine-readable metadata fields within text or\r\nbinary files as long as those fields can be easily viewed by the user.\r\n\r\n3) No Modified Version of the Font Software may use the Reserved Font\r\nName(s) unless explicit written permission is granted by the corresponding\r\nCopyright Holder. This restriction only applies to the primary font name as\r\npresented to the users.\r\n\r\n4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font\r\nSoftware shall not be used to promote, endorse or advertise any\r\nModified Version, except to acknowledge the contribution(s) of the\r\nCopyright Holder(s) and the Author(s) or with their explicit written\r\npermission.\r\n\r\n5) The Font Software, modified or unmodified, in part or in whole,\r\nmust be distributed entirely under this license, and must not be\r\ndistributed under any other license. The requirement for fonts to\r\nremain under this license does not apply to any document created\r\nusing the Font Software.\r\n\r\nTERMINATION\r\nThis license becomes null and void if any of the above conditions are\r\nnot met.\r\n\r\nDISCLAIMER\r\nTHE FONT SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\r\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF\r\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT\r\nOF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE\r\nCOPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\r\nINCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL\r\nDAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\r\nFROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM\r\nOTHER DEALINGS IN THE FONT SOFTWARE.\r\n"
  },
  {
    "path": "_static/fonts/Source_Code_Pro/README.txt",
    "content": "Source Code Pro Variable Font\n=============================\n\nThis download contains Source Code Pro as both variable fonts and static fonts.\n\nSource Code Pro is a variable font with this axis:\n  wght\n\nThis means all the styles are contained in these files:\n  SourceCodePro-VariableFont_wght.ttf\n  SourceCodePro-Italic-VariableFont_wght.ttf\n\nIf your app fully supports variable fonts, you can now pick intermediate styles\nthat aren’t available as static fonts. Not all apps support variable fonts, and\nin those cases you can use the static font files for Source Code Pro:\n  static/SourceCodePro-ExtraLight.ttf\n  static/SourceCodePro-Light.ttf\n  static/SourceCodePro-Regular.ttf\n  static/SourceCodePro-Medium.ttf\n  static/SourceCodePro-SemiBold.ttf\n  static/SourceCodePro-Bold.ttf\n  static/SourceCodePro-ExtraBold.ttf\n  static/SourceCodePro-Black.ttf\n  static/SourceCodePro-ExtraLightItalic.ttf\n  static/SourceCodePro-LightItalic.ttf\n  static/SourceCodePro-Italic.ttf\n  static/SourceCodePro-MediumItalic.ttf\n  static/SourceCodePro-SemiBoldItalic.ttf\n  static/SourceCodePro-BoldItalic.ttf\n  static/SourceCodePro-ExtraBoldItalic.ttf\n  static/SourceCodePro-BlackItalic.ttf\n\nGet started\n-----------\n\n1. Install the font files you want to use\n\n2. Use your app's font picker to view the font family and all the\navailable styles\n\nLearn more about variable fonts\n-------------------------------\n\n  https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts\n  https://variablefonts.typenetwork.com\n  https://medium.com/variable-fonts\n\nIn desktop apps\n\n  https://theblog.adobe.com/can-variable-fonts-illustrator-cc\n  https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts\n\nOnline\n\n  https://developers.google.com/fonts/docs/getting_started\n  https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide\n  https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts\n\nInstalling fonts\n\n  MacOS: https://support.apple.com/en-us/HT201749\n  Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux\n  Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows\n\nAndroid Apps\n\n  https://developers.google.com/fonts/docs/android\n  https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts\n\nLicense\n-------\nPlease read the full license text (OFL.txt) to understand the permissions,\nrestrictions and requirements for usage, redistribution, and modification.\n\nYou can use them in your products & projects – print or digital,\ncommercial or otherwise.\n\nThis isn't legal advice, please consider consulting a lawyer and see the full\nlicense for all details.\n"
  },
  {
    "path": "_static/fonts/Source_Sans_3/OFL.txt",
    "content": "Copyright 2010-2020 Adobe (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights Reserved. Source is a trademark of Adobe in the United States and/or other countries.\r\n\r\nThis Font Software is licensed under the SIL Open Font License, Version 1.1.\r\nThis license is copied below, and is also available with a FAQ at:\r\nhttp://scripts.sil.org/OFL\r\n\r\n\r\n-----------------------------------------------------------\r\nSIL OPEN FONT LICENSE Version 1.1 - 26 February 2007\r\n-----------------------------------------------------------\r\n\r\nPREAMBLE\r\nThe goals of the Open Font License (OFL) are to stimulate worldwide\r\ndevelopment of collaborative font projects, to support the font creation\r\nefforts of academic and linguistic communities, and to provide a free and\r\nopen framework in which fonts may be shared and improved in partnership\r\nwith others.\r\n\r\nThe OFL allows the licensed fonts to be used, studied, modified and\r\nredistributed freely as long as they are not sold by themselves. The\r\nfonts, including any derivative works, can be bundled, embedded, \r\nredistributed and/or sold with any software provided that any reserved\r\nnames are not used by derivative works. The fonts and derivatives,\r\nhowever, cannot be released under any other type of license. The\r\nrequirement for fonts to remain under this license does not apply\r\nto any document created using the fonts or their derivatives.\r\n\r\nDEFINITIONS\r\n\"Font Software\" refers to the set of files released by the Copyright\r\nHolder(s) under this license and clearly marked as such. This may\r\ninclude source files, build scripts and documentation.\r\n\r\n\"Reserved Font Name\" refers to any names specified as such after the\r\ncopyright statement(s).\r\n\r\n\"Original Version\" refers to the collection of Font Software components as\r\ndistributed by the Copyright Holder(s).\r\n\r\n\"Modified Version\" refers to any derivative made by adding to, deleting,\r\nor substituting -- in part or in whole -- any of the components of the\r\nOriginal Version, by changing formats or by porting the Font Software to a\r\nnew environment.\r\n\r\n\"Author\" refers to any designer, engineer, programmer, technical\r\nwriter or other person who contributed to the Font Software.\r\n\r\nPERMISSION & CONDITIONS\r\nPermission is hereby granted, free of charge, to any person obtaining\r\na copy of the Font Software, to use, study, copy, merge, embed, modify,\r\nredistribute, and sell modified and unmodified copies of the Font\r\nSoftware, subject to the following conditions:\r\n\r\n1) Neither the Font Software nor any of its individual components,\r\nin Original or Modified Versions, may be sold by itself.\r\n\r\n2) Original or Modified Versions of the Font Software may be bundled,\r\nredistributed and/or sold with any software, provided that each copy\r\ncontains the above copyright notice and this license. These can be\r\nincluded either as stand-alone text files, human-readable headers or\r\nin the appropriate machine-readable metadata fields within text or\r\nbinary files as long as those fields can be easily viewed by the user.\r\n\r\n3) No Modified Version of the Font Software may use the Reserved Font\r\nName(s) unless explicit written permission is granted by the corresponding\r\nCopyright Holder. This restriction only applies to the primary font name as\r\npresented to the users.\r\n\r\n4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font\r\nSoftware shall not be used to promote, endorse or advertise any\r\nModified Version, except to acknowledge the contribution(s) of the\r\nCopyright Holder(s) and the Author(s) or with their explicit written\r\npermission.\r\n\r\n5) The Font Software, modified or unmodified, in part or in whole,\r\nmust be distributed entirely under this license, and must not be\r\ndistributed under any other license. The requirement for fonts to\r\nremain under this license does not apply to any document created\r\nusing the Font Software.\r\n\r\nTERMINATION\r\nThis license becomes null and void if any of the above conditions are\r\nnot met.\r\n\r\nDISCLAIMER\r\nTHE FONT SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\r\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF\r\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT\r\nOF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE\r\nCOPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\r\nINCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL\r\nDAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\r\nFROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM\r\nOTHER DEALINGS IN THE FONT SOFTWARE.\r\n"
  },
  {
    "path": "_static/fonts/Source_Sans_3/README.txt",
    "content": "Source Sans 3 Variable Font\n===========================\n\nThis download contains Source Sans 3 as both variable fonts and static fonts.\n\nSource Sans 3 is a variable font with this axis:\n  wght\n\nThis means all the styles are contained in these files:\n  SourceSans3-VariableFont_wght.ttf\n  SourceSans3-Italic-VariableFont_wght.ttf\n\nIf your app fully supports variable fonts, you can now pick intermediate styles\nthat aren’t available as static fonts. Not all apps support variable fonts, and\nin those cases you can use the static font files for Source Sans 3:\n  static/SourceSans3-ExtraLight.ttf\n  static/SourceSans3-Light.ttf\n  static/SourceSans3-Regular.ttf\n  static/SourceSans3-Medium.ttf\n  static/SourceSans3-SemiBold.ttf\n  static/SourceSans3-Bold.ttf\n  static/SourceSans3-ExtraBold.ttf\n  static/SourceSans3-Black.ttf\n  static/SourceSans3-ExtraLightItalic.ttf\n  static/SourceSans3-LightItalic.ttf\n  static/SourceSans3-Italic.ttf\n  static/SourceSans3-MediumItalic.ttf\n  static/SourceSans3-SemiBoldItalic.ttf\n  static/SourceSans3-BoldItalic.ttf\n  static/SourceSans3-ExtraBoldItalic.ttf\n  static/SourceSans3-BlackItalic.ttf\n\nGet started\n-----------\n\n1. Install the font files you want to use\n\n2. Use your app's font picker to view the font family and all the\navailable styles\n\nLearn more about variable fonts\n-------------------------------\n\n  https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts\n  https://variablefonts.typenetwork.com\n  https://medium.com/variable-fonts\n\nIn desktop apps\n\n  https://theblog.adobe.com/can-variable-fonts-illustrator-cc\n  https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts\n\nOnline\n\n  https://developers.google.com/fonts/docs/getting_started\n  https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide\n  https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts\n\nInstalling fonts\n\n  MacOS: https://support.apple.com/en-us/HT201749\n  Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux\n  Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows\n\nAndroid Apps\n\n  https://developers.google.com/fonts/docs/android\n  https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts\n\nLicense\n-------\nPlease read the full license text (OFL.txt) to understand the permissions,\nrestrictions and requirements for usage, redistribution, and modification.\n\nYou can use them in your products & projects – print or digital,\ncommercial or otherwise.\n\nThis isn't legal advice, please consider consulting a lawyer and see the full\nlicense for all details.\n"
  },
  {
    "path": "_static/fonts/Source_Serif_4/OFL.txt",
    "content": "This Font Software is licensed under the SIL Open Font License, Version 1.1.\r\nThis license is copied below, and is also available with a FAQ at:\r\nhttp://scripts.sil.org/OFL\r\n\r\n\r\n-----------------------------------------------------------\r\nSIL OPEN FONT LICENSE Version 1.1 - 26 February 2007\r\n-----------------------------------------------------------\r\n\r\nPREAMBLE\r\nThe goals of the Open Font License (OFL) are to stimulate worldwide\r\ndevelopment of collaborative font projects, to support the font creation\r\nefforts of academic and linguistic communities, and to provide a free and\r\nopen framework in which fonts may be shared and improved in partnership\r\nwith others.\r\n\r\nThe OFL allows the licensed fonts to be used, studied, modified and\r\nredistributed freely as long as they are not sold by themselves. The\r\nfonts, including any derivative works, can be bundled, embedded, \r\nredistributed and/or sold with any software provided that any reserved\r\nnames are not used by derivative works. The fonts and derivatives,\r\nhowever, cannot be released under any other type of license. The\r\nrequirement for fonts to remain under this license does not apply\r\nto any document created using the fonts or their derivatives.\r\n\r\nDEFINITIONS\r\n\"Font Software\" refers to the set of files released by the Copyright\r\nHolder(s) under this license and clearly marked as such. This may\r\ninclude source files, build scripts and documentation.\r\n\r\n\"Reserved Font Name\" refers to any names specified as such after the\r\ncopyright statement(s).\r\n\r\n\"Original Version\" refers to the collection of Font Software components as\r\ndistributed by the Copyright Holder(s).\r\n\r\n\"Modified Version\" refers to any derivative made by adding to, deleting,\r\nor substituting -- in part or in whole -- any of the components of the\r\nOriginal Version, by changing formats or by porting the Font Software to a\r\nnew environment.\r\n\r\n\"Author\" refers to any designer, engineer, programmer, technical\r\nwriter or other person who contributed to the Font Software.\r\n\r\nPERMISSION & CONDITIONS\r\nPermission is hereby granted, free of charge, to any person obtaining\r\na copy of the Font Software, to use, study, copy, merge, embed, modify,\r\nredistribute, and sell modified and unmodified copies of the Font\r\nSoftware, subject to the following conditions:\r\n\r\n1) Neither the Font Software nor any of its individual components,\r\nin Original or Modified Versions, may be sold by itself.\r\n\r\n2) Original or Modified Versions of the Font Software may be bundled,\r\nredistributed and/or sold with any software, provided that each copy\r\ncontains the above copyright notice and this license. These can be\r\nincluded either as stand-alone text files, human-readable headers or\r\nin the appropriate machine-readable metadata fields within text or\r\nbinary files as long as those fields can be easily viewed by the user.\r\n\r\n3) No Modified Version of the Font Software may use the Reserved Font\r\nName(s) unless explicit written permission is granted by the corresponding\r\nCopyright Holder. This restriction only applies to the primary font name as\r\npresented to the users.\r\n\r\n4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font\r\nSoftware shall not be used to promote, endorse or advertise any\r\nModified Version, except to acknowledge the contribution(s) of the\r\nCopyright Holder(s) and the Author(s) or with their explicit written\r\npermission.\r\n\r\n5) The Font Software, modified or unmodified, in part or in whole,\r\nmust be distributed entirely under this license, and must not be\r\ndistributed under any other license. The requirement for fonts to\r\nremain under this license does not apply to any document created\r\nusing the Font Software.\r\n\r\nTERMINATION\r\nThis license becomes null and void if any of the above conditions are\r\nnot met.\r\n\r\nDISCLAIMER\r\nTHE FONT SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\r\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF\r\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT\r\nOF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE\r\nCOPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\r\nINCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL\r\nDAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\r\nFROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM\r\nOTHER DEALINGS IN THE FONT SOFTWARE.\r\n"
  },
  {
    "path": "_static/fonts/Source_Serif_4/README.txt",
    "content": "Source Serif 4 Variable Font\n============================\n\nThis download contains Source Serif 4 as both variable fonts and static fonts.\n\nSource Serif 4 is a variable font with these axes:\n  opsz\n  wght\n\nThis means all the styles are contained in these files:\n  SourceSerif4-VariableFont_opsz,wght.ttf\n  SourceSerif4-Italic-VariableFont_opsz,wght.ttf\n\nIf your app fully supports variable fonts, you can now pick intermediate styles\nthat aren’t available as static fonts. Not all apps support variable fonts, and\nin those cases you can use the static font files for Source Serif 4:\n  static/SourceSerif4/SourceSerif4-ExtraLight.ttf\n  static/SourceSerif4/SourceSerif4-Light.ttf\n  static/SourceSerif4/SourceSerif4-Regular.ttf\n  static/SourceSerif4/SourceSerif4-Medium.ttf\n  static/SourceSerif4/SourceSerif4-SemiBold.ttf\n  static/SourceSerif4/SourceSerif4-Bold.ttf\n  static/SourceSerif4/SourceSerif4-ExtraBold.ttf\n  static/SourceSerif4/SourceSerif4-Black.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraLight.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Light.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Regular.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Medium.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-SemiBold.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Bold.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraBold.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Black.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraLight.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Light.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Regular.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Medium.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-SemiBold.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Bold.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraBold.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Black.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraLight.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Light.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Regular.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Medium.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-SemiBold.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Bold.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraBold.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Black.ttf\n  static/SourceSerif4/SourceSerif4-ExtraLightItalic.ttf\n  static/SourceSerif4/SourceSerif4-LightItalic.ttf\n  static/SourceSerif4/SourceSerif4-Italic.ttf\n  static/SourceSerif4/SourceSerif4-MediumItalic.ttf\n  static/SourceSerif4/SourceSerif4-SemiBoldItalic.ttf\n  static/SourceSerif4/SourceSerif4-BoldItalic.ttf\n  static/SourceSerif4/SourceSerif4-ExtraBoldItalic.ttf\n  static/SourceSerif4/SourceSerif4-BlackItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraLightItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-LightItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-Italic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-MediumItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-SemiBoldItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-BoldItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraBoldItalic.ttf\n  static/SourceSerif4_18pt/SourceSerif4_18pt-BlackItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraLightItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-LightItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-Italic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-MediumItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-SemiBoldItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-BoldItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraBoldItalic.ttf\n  static/SourceSerif4_36pt/SourceSerif4_36pt-BlackItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraLightItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-LightItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-Italic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-MediumItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-SemiBoldItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-BoldItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraBoldItalic.ttf\n  static/SourceSerif4_48pt/SourceSerif4_48pt-BlackItalic.ttf\n\nGet started\n-----------\n\n1. Install the font files you want to use\n\n2. Use your app's font picker to view the font family and all the\navailable styles\n\nLearn more about variable fonts\n-------------------------------\n\n  https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts\n  https://variablefonts.typenetwork.com\n  https://medium.com/variable-fonts\n\nIn desktop apps\n\n  https://theblog.adobe.com/can-variable-fonts-illustrator-cc\n  https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts\n\nOnline\n\n  https://developers.google.com/fonts/docs/getting_started\n  https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide\n  https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts\n\nInstalling fonts\n\n  MacOS: https://support.apple.com/en-us/HT201749\n  Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux\n  Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows\n\nAndroid Apps\n\n  https://developers.google.com/fonts/docs/android\n  https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts\n\nLicense\n-------\nPlease read the full license text (OFL.txt) to understand the permissions,\nrestrictions and requirements for usage, redistribution, and modification.\n\nYou can use them in your products & projects – print or digital,\ncommercial or otherwise.\n\nThis isn't legal advice, please consider consulting a lawyer and see the full\nlicense for all details.\n"
  },
  {
    "path": "_static/fontsizes.html",
    "content": "<html>\n  <head>\n    <title>Testing relative font sizes</title>\n\n<style>\n  @font-face {\n    font-family: 'Source Serif 4';\n    src: url('./fonts/Source_Serif_4/SourceSerif4-Italic-VariableFont_opsz,wght.ttf') format('truetype-variations');\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Serif 4';\n    src: url('./fonts/Source_Serif_4/SourceSerif4-VariableFont_opsz,wght.ttf') format('truetype-variations');\n    font-style: normal;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Sans 3';\n    src: url('./fonts/Source_Sans_3/SourceSans3-Italic-VariableFont_wght.ttf') format('truetype-variations');\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Sans 3';\n    src: url('./fonts/Source_Sans_3/SourceSans3-VariableFont_wght.ttf') format('truetype-variations');\n    font-style: normal;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Code Pro';\n    src: url('./fonts/Source_Code_Pro/SourceCodePro-Italic-VariableFont_wght.ttf') format('truetype-variations');\n    font-style: italic;\n    font-weight: 1 999;\n}\n\n@font-face {\n    font-family: 'Source Code Pro';\n    src: url('./fonts/Source_Code_Pro/SourceCodePro-VariableFont_wght.ttf') format('truetype-variations');\n    font-style: normal;\n    font-weight: 1 999;\n}\n</style>\n\n<style>\n  span.adj-serif {\n      font-size-adjust: none; \n      font-weight: 400; \n  }\n  span.adj-sans {\n      font-size-adjust: 0.47; \n      font-weight: 400; \n  }\n  span.adj-mono {\n      font-size-adjust: 0.46; \n      font-weight: 400; \n  }\n  span.adj-serif-i {\n      font-size-adjust: none; \n      font-weight: 400; \n  }\n  span.adj-sans-i {\n      font-size-adjust: 0.47; \n      font-weight: 400;\n  }\n  span.adj-mono-i {\n      font-size-adjust: 0.46; \n      font-weight: 400; \n  }\n</style>\n\n\n  </head>\n  <body>\n\n    <h1>Samples for comparing font-size-adjust and variable-axis\n    tweaks on the Source Superfamily</h1>\n    \n    <h2>Unadjusted</h2>\n    <h3>72 pt</h3>\n    <p>\n      Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:72pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:72pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:72pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:72pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:72pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:72pt; font-style: italic\">Hn</span>\n      </p>\n\n    <h3>28 pt</h3>\n    <p>\n      Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:28pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:28pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:28pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:28pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:28pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:28pt; font-style: italic\">Hn</span>\n      </p>\n\n    <h3>16 pt</h3>\n    <p>\n      Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:16pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:16pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:16pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:16pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:16pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:16pt; font-style: italic\">Hn</span>\n      </p>\n      \n    <h3>10 pt</h3>\n    <p>\n      Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:10pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:10pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:10pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Sans 3'; font-size:10pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Serif 4'; font-size:10pt; font-style: italic\">Hn</span><span style=\"font-family:'Source Code Pro'; font-size:10pt; font-style: italic\">Hn</span>\n      </p>\n\n\n      <hr>\n      \n    <h2>Adjusted</h2>\n    <h3>72 pt</h3>\n    <p>\n      control + Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span> <span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span class=\"adj-sans\" style=\"font-family:'Source Sans 3'; font-size:72pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span class=\"adj-mono\" style=\"font-family:'Source Code Pro'; font-size:72pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:72pt\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:72pt; font-style: italic\">Hn</span><span class=\"adj-sans-i\" style=\"font-family:'Source Sans 3'; font-size:72pt; font-style: italic\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:72pt; font-style: italic\">Hn</span><span class=\"adj-mono-i\" style=\"font-family:'Source Code Pro'; font-size:72pt; font-style: italic\">Hn</span>\n      </p>\n\n    <h3>28 pt</h3>\n    <p>\n      control + Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span> <span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span class=\"adj-sans\" style=\"font-family:'Source Sans 3'; font-size:28pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span class=\"adj-mono\" style=\"font-family:'Source Code Pro'; font-size:28pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:28pt\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:28pt; font-style: italic\">Hn</span><span class=\"adj-sans-i\" style=\"font-family:'Source Sans 3'; font-size:28pt; font-style: italic\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:28pt; font-style: italic\">Hn</span><span class=\"adj-mono-i\" style=\"font-family:'Source Code Pro'; font-size:28pt; font-style: italic\">Hn</span>\n      </p>\n\n    <h3>16 pt</h3>\n    <p>\n      control + Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span> <span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span class=\"adj-sans\" style=\"font-family:'Source Sans 3'; font-size:16pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span class=\"adj-mono\" style=\"font-family:'Source Code Pro'; font-size:16pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:16pt\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:16pt; font-style: italic\">Hn</span><span class=\"adj-sans-i\" style=\"font-family:'Source Sans 3'; font-size:16pt; font-style: italic\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:16pt; font-style: italic\">Hn</span><span class=\"adj-mono-i\" style=\"font-family:'Source Code Pro'; font-size:16pt; font-style: italic\">Hn</span>\n      </p>\n      \n    <h3>10 pt</h3>\n    <p>\n      control + Source Serif 4 | Source Sans 3 | Source Serif 4 | Source Code Pro | Source Serif 4 | ...italics\n      <p>\n\t<span style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span> <span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span class=\"adj-sans\" style=\"font-family:'Source Sans 3'; font-size:10pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span class=\"adj-mono\" style=\"font-family:'Source Code Pro'; font-size:10pt\">Hn</span><span class=\"adj-serif\" style=\"font-family:'Source Serif 4'; font-size:10pt\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:10pt; font-style: italic\">Hn</span><span class=\"adj-sans-i\" style=\"font-family:'Source Sans 3'; font-size:10pt; font-style: italic\">Hn</span><span class=\"adj-serif-i\" style=\"font-family:'Source Serif 4'; font-size:10pt; font-style: italic\">Hn</span><span class=\"adj-mono-i\" style=\"font-family:'Source Code Pro'; font-size:10pt; font-style: italic\">Hn</span>\n      </p>\n\n      \n  </body>\n</html>\n"
  },
  {
    "path": "_static/toggleSvgColors.js",
    "content": "function toggleColor(elementId) {\n    demoImage = document.getElementById(elementId);\n    console.log(elementId);\n    \n    if (demoImage.classList.contains(\"shaping-demo\")) {\n\tif (demoImage.classList.contains(\"greyscale-svg\")) {\n\t    \n\t    demoImage.classList.add(\"color-svg\");\n\t    demoImage.classList.remove(\"greyscale-svg\");\n\t    \n\t} else {\n\t    if (demoImage.classList.contains(\"color-svg\")) {\n\n\t\tdemoImage.classList.add(\"greyscale-svg\");\n\t\tdemoImage.classList.remove(\"color-svg\");\n\t      \n\t    }\n\t}\n    }\n    else {\n\tconsole.log(\"toggleColor called on element that is not .shaping-demo class\");\n    }\n}\n\n"
  },
  {
    "path": "_templates/layout.html",
    "content": "{%- extends \"!layout.html\" %}\n{% block extrahead %}\n  {{ super() }}\n{% endblock %}\n"
  },
  {
    "path": "_templates/static_nav.html",
    "content": "<h3>Contents</h3>\n<ul class=\"current\">\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference internal\" href=\"/index.html\">Overview</a>\n  </li>\n  <li class=\"toctree-l1 static-nav static-nav-heading\">\n    Script shaping\n    <ul class=\"current\">\n      <li class=\"toctree-l2 static-nav static-nav-heading\">\n\tIndic Model\n\t<ul class=\"current\">\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-indic-general.html\">Indic general</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-devanagari.html\">Devanagari</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-bengali.html\">Bengali</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-gujarati.html\">Gujarati</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-gurmukhi.html\">Gurmukhi</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-kannada.html\">Kannada</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-malayalam.html\">Malayalam</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-oriya.html\">Oriya</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-sinhala.html\">Sinhala</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-tamil.html\">Tamil</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-telugu.html\">Telugu</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-vedic-extensions.html\">Vedic Extensions</a>\n\t  </li>\n\t</ul>\n      </li>\n      <li class=\"toctree-l2 static-nav static-nav-heading\">\n\tArabic Model\n\t<ul class=\"current\">\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-arabic-general.html\">Arabic general</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-arabic.html\">Arabic</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-nko.html\">N'Ko</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-syriac.html\">Syriac</a>\n\t  </li>\n\t  <li class=\"toctree-l3 static-nav\">\n\t    <a class=\"reference internal\" href=\"/opentype-shaping-mongolian.html\">Mongolian</a>\n\t  </li>\n\t</ul>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-hangul.html\">Hangul</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-hebrew.html\">Hebrew</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-khmer.html\">Khmer</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-myanmar.html\">Myanmar</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-thai-lao.html\">Thai and Lao</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-tibetan.html\">Tibetan</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-use.html\">Universal Shaping Engine</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-default.html\">Default scripts</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/opentype-shaping-emoji.html\">Emoji</a>\n      </li>\n    </ul>\n  </li>\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference internal\" href=\"/opentype-shaping-normalization.html\">Normalization</a>\n  </li>\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference internal\" href=\"/notes/README.html\">Notes</a>\n    <ul>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/notes/uniscribe-bug-compatibility.html\">Uniscribe compatibility</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/notes/ragel-machine-notation.html\">Ragel state-machine operators</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/notes/emoji-implementation.html\">Emoji implementation</a>\n      </li>\n    </ul>\n  </li>\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference internal\" href=\"/character-tables/character-tables-index.html\">Character Tables</a>\n    <ul>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-arabic.html\">Arabic</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-bengali.html\">Bengali</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-devanagari.html\">Devanagari</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-gujarati.html\">Gujarati</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-gurmukhi.html\">Gurmukhi</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-hangul.html\">Hangul</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-hebrew.html\">Hebrew</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-kannada.html\">Kannada</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-khmer.html\">Khmer</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-lao.html\">Lao</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-malayalam.html\">Malayalam</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-mongolian.html\">Mongolian</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-myanmar.html\">Myanmar</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-nko.html\">N'Ko</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-oriya.html\">Oriya</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-sinhala.html\">Sinhala</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-syriac.html\">Syriac</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-tamil.html\">Tamil</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-telugu.html\">Telugu</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-thai.html\">Thai</a>\n      </li>\n      <li class=\"toctree-l2 static-nav\">\n\t<a class=\"reference internal\" href=\"/character-tables/character-tables-tibetan.html\">Tibetan</a>\n      </li>\n    </ul>    \n  </li>\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference internal\" href=\"/errata.html\">Errata</a>\n  </li>\n</ul>\n\n<hr>\n\n<ul class=\"extra-links\">\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference external\" href=\"https://github.com/n8willis/opentype-shaping-documents/issues\">GitHub issues</a>\n  </li>\n  <li class=\"toctree-l1 static-nav\">\n    <a class=\"reference external\" href=\"https://github.com/n8willis/opentype-shaping-documents/blob/master/BUILD.md\">Build process</a>\n  </li>\n</ul>\n"
  },
  {
    "path": "_toc.yml",
    "content": "root: index\noptions:\n  maxdepth: 0\n  numbered: False\n  hidden: True\n  titlesonly: True\nentries:\n  - file: overview\n    title: Overview\n    subtrees:\n    - maxdepth: 0\n      numbered: 3\n      entries:\n      - file: opentype-shaping-indic-general\n        title: Indic general\n      - file: opentype-shaping-devanagari\n        title: Devanagari\n      - file: opentype-shaping-bengali\n        title: Bengali\n      - file: opentype-shaping-gujarati\n        title: Gujarati\n      - file: opentype-shaping-gurmukhi\n        title: Gurmukhi\n      - file: opentype-shaping-kannada\n        title: Kannada\n      - file: opentype-shaping-malayalam\n        title: Malayalam\n      - file: opentype-shaping-oriya\n        title: Oriya\n      - file: opentype-shaping-sinhala\n        title: Sinhala\n      - file: opentype-shaping-tamil\n        title: Tamil\n      - file: opentype-shaping-telugu\n        title: Telugu\n      - file: opentype-shaping-vedic-extensions\n        title: Vedic extensions\n      - file: opentype-shaping-arabic-general\n        title: Arabic general\n      - file: opentype-shaping-arabic\n        title: Arabic\n      - file: opentype-shaping-nko\n        title: N'Ko\n      - file: opentype-shaping-syriac\n        title: Syriac\n      - file: opentype-shaping-mongolian\n        title: Mongolian\n      - file: opentype-shaping-hangul\n        title: Hangul\n      - file: opentype-shaping-hebrew\n        title: Hebrew\n      - file: opentype-shaping-khmer\n        title: Khmer\n      - file: opentype-shaping-myanmar\n        title: Myanmar\n      - file: opentype-shaping-thai-lao\n        title: Thai and Lao\n      - file: opentype-shaping-tibetan\n        title: Tibetan\n      - file: opentype-shaping-use\n        title: Universal Shaping Engine\n      - file: opentype-shaping-default\n        title: Default scripts\n      - file: opentype-shaping-emoji\n        title: Emoji\n      - file: opentype-shaping-normalization\n        title: Normalization\n  - file: notes/index\n    title: Notes\n    subtrees:\n    - maxdepth: 0\n      numbered: False\n      entries:\n      - file: notes/uniscribe-bug-compatibility\n        title: Uniscribe compatibility\n      - file: notes/ragel-machine-notation\n        title: Ragel state-machine operators\n      - file: notes/emoji-implementation\n        title: Emoji implementation\n      - file: character-tables/index\n        title: Character tables\n      - file: character-tables/character-tables-arabic\n        title: Arabic\n      - file: character-tables/character-tables-bengali\n        title: Bengali\n      - file: character-tables/character-tables-devanagari\n        title: Devanagari\n      - file: character-tables/character-tables-gujarati\n        title: Gujarati\n      - file: character-tables/character-tables-gurmukhi\n        title: Gurmukhi\n      - file: character-tables/character-tables-hangul\n        title: Hangul\n      - file: character-tables/character-tables-hebrew\n        title: Hebrew\n      - file: character-tables/character-tables-kannada\n        title: Kannada\n      - file: character-tables/character-tables-khmer\n        title: Khmer\n      - file: character-tables/character-tables-lao\n        title: Lao\n      - file: character-tables/character-tables-malayalam\n        title: Malayalam\n      - file: character-tables/character-tables-mongolian\n        title: Mongolian\n      - file: character-tables/character-tables-myanmar\n        title: Myanmar\n      - file: character-tables/character-tables-nko\n        title: N'Ko\n      - file: character-tables/character-tables-oriya\n        title: Oriya\n      - file: character-tables/character-tables-sinhala\n        title: Sinhala\n      - file: character-tables/character-tables-syriac\n        title: Syriac\n      - file: character-tables/character-tables-tamil\n        title: Tamil\n      - file: character-tables/character-tables-telugu\n        title: Telugu\n      - file: character-tables/character-tables-thai\n        title: Thai\n      - file: character-tables/character-tables-tibetan\n        title: Tibetan\n      - file: errata\n        title: Errata\n  \n"
  },
  {
    "path": "build-requirements.txt",
    "content": "alabaster==1.0.0\nimportlib-metadata>=5.0.0\nmyst-parser>=0.19.1\ndocutils==0.21.2\nmarkdown-it-py==3.0.0\npip>=22.1.2\npyparsing>=3.0.9\npyspelling>=2.12.1\npytz>=2022.4\nsetuptools>=62.6.0\nSphinx==8.1.3\nsphinx_external_toc>=1.1.0\nsphinx-inline-svg>=0.2.0\nsphinx-multitoc-numbering==0.1.3\nsvg-stack>=0.1.0\ncloud-sptheme>=1.10.0\n"
  },
  {
    "path": "character-tables/README.md",
    "content": "# Character tables #\n\nThe files in this directory include per-srcipt reference tables\nshowing the shaping-related properties of the codepoints used for each\nscript.\n\n\n  - Indic\n      - [Devanagari](character-tables-devanagari.md)\n      - [Bengali](character-tables-bengali.md)\n      - [Gujarati](character-tables-gujarati.md)\n      - [Gurmukhi](character-tables-gurmukhi.md)\n      - [Kannada](character-tables-kannada.md)\n      - [Malayalam](character-tables-malayalam.md)\n      - [Oriya](character-tables-oriya.md)\n      - [Tamil](character-tables-tamil.md)\n      - [Telugu](character-tables-telugu.md)\n      - [Sinhala](character-tables-sinhala.md)\n\t  - _Vedic Extensions tables are included in each Indic script_\n  - Arabic\n      - [Arabic](character-tables-arabic.md)\n      - [Syriac](character-tables-syriac.md)\n      - [N'Ko](character-tables-nko.md)\n      - [Mongolian](character-tables-mongolian.md)\n  - Hangul\n      - [Hangul Jamo](character-tables-hangul.md)\n  - Hebrew\n      - [Hebrew](character-tables-hebrew.md)\n  - Khmer\n      - [Khmer](character-tables-khmer.md)\n  - Lao\n      - [Lao](character-tables-lao.md)\n  - Myanmar\n      - [Myanmar](character-tables-myanmar.md)\n  - Thai\n      - [Thai](character-tables-thai.md)\n  - Tibetan\n      - [Tibetan](character-tables-tibetan.md)\n\n\nTables are not provided for the default or Universal Shaping Engine\n(<abbr>USE</abbr>) shaping documents, each of which covers a\nmultitude of individual scripts, nor for the emoji shaping document,\nbecause emoji usage is not specific to any individual script.\n"
  },
  {
    "path": "character-tables/character-tables-arabic.md",
    "content": "# Arabic character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Arabic text](../opentype-shaping-arabic.md).\n\n**Contents**\n\n  - [Arabic character table](#arabic-character-table)\n  - [Arabic Supplement character table](#arabic-supplement-character-table)\n  - [Arabic Extended-A character table](#arabic-extended-a-character-table)\n  - [Arabic Extended-B character table](#arabic-extended-b-character-table)\n  - [Arabic Extended-C character table](#arabic-extended-c-character-table)\n  - [Rumi Numeral Symbols character table](#rumi-numeral-symbols-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Arabic character table ##\n\nArabic glyphs should be classified as in the following\ntable. Codepoints in the Arabic block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nThe _Joining type_ column indicates whether each codepoint is defined\nas joining with adjacent characters on the left side, right side, left\nand right sides (\"DUAL\"), or neither side (\"NON_JOINING\"). Codepoints\ndesignated TRANSPARENT in the _Joining type_ column do not join with\nadjacent characters and, in addition, do not affect the joining\nbehavior of surrounding characters. Non-spacing marks are of type\nTRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent\ncharacters to join.\n\nThe _Joining group_ column lists the fundamental letter that the\nlisted codepoint behaves like for joining purposes.\n\nAssigned codepoints with a _null_ in the _Joining group_\ncolumn evoke no special behavior from the shaping engine during the\njoin-computation stage.\n\nThe _Mark class_ column indicates the Canonical Combining Class\nfor the codepoint.  Marks are assigned non-zero combining classes so\nthat sequences of adjacent marks can be reordered as required by the\northography. \n\nFor Arabic, a subset of marks in the 220 and 230 classes are also\ndesignated _Modifier Combining Marks_ (<abbr>MCM</abbr>). These are denoted with\n_220_MCM_ and _230_MCM_ in the _Mark class_ column. The <abbr title=\"Modifier Combining Mark\">MCM</abbr> marks are\ntreated differently during the mark-reordering stage.\n\n\n\n:::{table} Arabic block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+0600`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0600; Number Sign                          |\n|`U+0601`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0601; Sign Sanah                           |\n|`U+0602`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0602; Footnote Marker                      |\n|`U+0603`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0603; Sign Safha                           |\n|`U+0604`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0604; Sign Samvat                          |\n|`U+0605`   | Other            | NON_JOINING  | _null_               | _0_        | &#x0605; Number Mark Above                    |\n|`U+0606`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0606; Cube Root                            |\n|`U+0607`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0607; Fourth Root                          |\n|`U+0608`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0608; Ray                                  |\n|`U+0609`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0609; Per Mille                            |\n|`U+060A`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x060A; Per Ten Thousand                     |\n|`U+060B`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x060B; Afghani Sign                         |\n|`U+060C`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x060C; Comma                                |\n|`U+060D`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x060D; Date Separator                       |\n|`U+060E`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x060E; Poetic Verse Sign                    |\n|`U+060F`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x060F; Sign Misra                           |\n| | | | | |                                                                                                                      \n|`U+0610`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0610; Sign Sallallahou Alayhe Wassallam    |\n|`U+0611`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0611; Sign Alayhe Assallam                 |\n|`U+0612`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0612; Sign Rahmatullah Alayhe              |\n|`U+0613`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0613; Sign Radi Allahou Anhu               |\n|`U+0614`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0614; Sign Takhallus                       |\n|`U+0615`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0615; Small High Tah                       |\n|`U+0616`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0616; Small High Alef Lam Yeh              |\n|`U+0617`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0617; Small High Zain                      |\n|`U+0618`   | Mark [Mn]        | TRANSPARENT  | _null_               | 30         | &#x0618; Small Fatha                          |\n|`U+0619`   | Mark [Mn]        | TRANSPARENT  | _null_               | 31         | &#x0619; Small Damma                          |\n|`U+061A`   | Mark [Mn]        | TRANSPARENT  | _null_               | 32         | &#x061A; Small Kasra                          |\n|`U+061B`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x061B; Semicolon                            |\n|`U+061C`   | Other            | TRANSPARENT  | _null_               | _0_        | &#x061C; Arabic Letter Mark                   |\n|`U+061D`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x061D; End Of Text Mark                     |\n|`U+061E`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x061E; Triple Dot Punctuation Mark          |\n|`U+061F`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x061F; Question Mark                        |\n| | | | | |                                                                                                                       \n|`U+0620`   | Letter           | DUAL         | YEH                  | _0_        | &#x0620; Kashmiri Yeh                         |\n|`U+0621`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x0621; Hamza                                |\n|`U+0622`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0622; Alef With Madda Above                |\n|`U+0623`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0623; Alef With Hamza Above                |\n|`U+0624`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0624; Waw With Hamza Above                 |\n|`U+0625`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0625; Alef With Hamza Below                |\n|`U+0626`   | Letter           | DUAL         | YEH                  | _0_        | &#x0626; Dotless Yeh With Hamza Above         |\n|`U+0627`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0627; Alef                                 |\n|`U+0628`   | Letter           | DUAL         | BEH                  | _0_        | &#x0628; Beh                                  |\n|`U+0629`   | Letter           | RIGHT        | TEH_MARBUTA          | _0_        | &#x0629; Teh Marbuta                          |\n|`U+062A`   | Letter           | DUAL         | BEH                  | _0_        | &#x062A; Dotless Beh With 2 Dots Above        |\n|`U+062B`   | Letter           | DUAL         | BEH                  | _0_        | &#x062B; Dotless Beh With 3 Dots Above        |\n|`U+062C`   | Letter           | DUAL         | HAH                  | _0_        | &#x062C; Hah With Dot Below                   |\n|`U+062D`   | Letter           | DUAL         | HAH                  | _0_        | &#x062D; Hah                                  |\n|`U+062E`   | Letter           | DUAL         | HAH                  | _0_        | &#x062E; Hah With Dot Above                   |\n|`U+062F`   | Letter           | RIGHT        | DAL                  | _0_        | &#x062F; Dal                                  |\n| | | | | |                                                                                                                       \n|`U+0630`   | Letter           | RIGHT        | DAL                  | _0_        | &#x0630; Dal With Dot Above                   |\n|`U+0631`   | Letter           | RIGHT        | REH                  | _0_        | &#x0631; Reh                                  |\n|`U+0632`   | Letter           | RIGHT        | REH                  | _0_        | &#x0632; Reh With Dot Above                   |\n|`U+0633`   | Letter           | DUAL         | SEEN                 | _0_        | &#x0633; Seen                                 |\n|`U+0634`   | Letter           | DUAL         | SEEN                 | _0_        | &#x0634; Seen With 3 Dots Above               |\n|`U+0635`   | Letter           | DUAL         | SAD                  | _0_        | &#x0635; Sad                                  |\n|`U+0636`   | Letter           | DUAL         | SAD                  | _0_        | &#x0636; Sad With Dot Above                   |\n|`U+0637`   | Letter           | DUAL         | TAH                  | _0_        | &#x0637; Tah                                  |\n|`U+0638`   | Letter           | DUAL         | TAH                  | _0_        | &#x0638; Tah With Dot Above                   |\n|`U+0639`   | Letter           | DUAL         | AIN                  | _0_        | &#x0639; Ain                                  |\n|`U+063A`   | Letter           | DUAL         | AIN                  | _0_        | &#x063A; Ain With Dot Above                   |\n|`U+063B`   | Letter           | DUAL         | GAF                  | _0_        | &#x063B; Keheh With 2 Dots Above              |\n|`U+063C`   | Letter           | DUAL         | GAF                  | _0_        | &#x063C; Keheh With 3 Dots Below              |\n|`U+063D`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x063D; Farsi Yeh With Inverted V Above      |\n|`U+063E`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x063E; Farsi Yeh With 2 Dots Above          |\n|`U+063F`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x063F; Farsi Yeh With 3 Dots Above          |\n| | | | | |                                                                                                                       \n|`U+0640`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x0640; Tatweel                              |\n|`U+0641`   | Letter           | DUAL         | FEH                  | _0_        | &#x0641; Feh                                  |\n|`U+0642`   | Letter           | DUAL         | QAF                  | _0_        | &#x0642; Qaf                                  |\n|`U+0643`   | Letter           | DUAL         | KAF                  | _0_        | &#x0643; Kaf                                  |\n|`U+0644`   | Letter           | DUAL         | LAM                  | _0_        | &#x0644; Lam                                  |\n|`U+0645`   | Letter           | DUAL         | MEEM                 | _0_        | &#x0645; Meem                                 |\n|`U+0646`   | Letter           | DUAL         | NOON                 | _0_        | &#x0646; Noon                                 |\n|`U+0647`   | Letter           | DUAL         | HEH                  | _0_        | &#x0647; Heh                                  |\n|`U+0648`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0648; Waw                                  |\n|`U+0649`   | Letter           | DUAL         | YEH                  | _0_        | &#x0649; Dotless Yeh                          |\n|`U+064A`   | Letter           | DUAL         | YEH                  | _0_        | &#x064A; Yeh                                  |\n|`U+064B`   | Mark [Mn]        | TRANSPARENT  | _null_               | 27         | &#x064B; Fathatan                             |\n|`U+064C`   | Mark [Mn]        | TRANSPARENT  | _null_               | 28         | &#x064C; Dammatan                             |\n|`U+064D`   | Mark [Mn]        | TRANSPARENT  | _null_               | 29         | &#x064D; Kasratan                             |\n|`U+064E`   | Mark [Mn]        | TRANSPARENT  | _null_               | 30         | &#x064E; Fatha                                |\n|`U+064F`   | Mark [Mn]        | TRANSPARENT  | _null_               | 31         | &#x064F; Damma                                |\n| | | | | |                                                                                                                      \n|`U+0650`   | Mark [Mn]        | TRANSPARENT  | _null_               | 32         | &#x0650; Kasra                                |\n|`U+0651`   | Mark [Mn]        | TRANSPARENT  | _null_               | 33         | &#x0651; Shadda                               |\n|`U+0652`   | Mark [Mn]        | TRANSPARENT  | _null_               | 34         | &#x0652; Sukun                                |\n|`U+0653`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0653; Maddah Above                         |\n|`U+0654`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x0654; Hamza Above                          |\n|`U+0655`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220_MCM    | &#x0655; Hamza Below                          |\n|`U+0656`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0656; Subscript Alef                       |\n|`U+0657`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0657; Inverted Damma                       |\n|`U+0658`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x0658; Noon Ghunna                          |\n|`U+0659`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0659; Zwarakay                             |\n|`U+065A`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x065A; Vowel Sign Small V Above             |\n|`U+065B`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x065B; Vowel Sign Inverted Small V Above    |\n|`U+065C`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x065C; Vowel Sign Dot Below                 |\n|`U+065D`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x065D; Reversed Damma                       |\n|`U+065E`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x065E; Fatha with Two Dots                  |\n|`U+065F`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x065F; Wavy Hamza Below                     |\n| | | | | |                                                                                                                      \n|`U+0660`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0660; Digit Zero                           |\n|`U+0661`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0661; Digit One                            |\n|`U+0662`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0662; Digit Two                            |\n|`U+0663`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0663; Digit Three                          |\n|`U+0664`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0664; Digit Four                           |\n|`U+0665`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0665; Digit Five                           |\n|`U+0666`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0666; Digit Six                            |\n|`U+0667`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0667; Digit Seven                          |\n|`U+0668`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0668; Digit Eight                          |\n|`U+0669`   | Number           | NON_JOINING  | _null_               | _0_        | &#x0669; Digit Nine                           |\n|`U+066A`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x066A; Percent Sign                         |\n|`U+066B`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x066B; Decimal Separator                    |\n|`U+066C`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x066C; Thousands Separator                  |\n|`U+066D`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x066D; Five Pointed Star                    |\n|`U+066E`   | Letter           | DUAL         | BEH                  | _0_        | &#x066E; Dotless Beh                          |\n|`U+066F`   | Letter           | DUAL         | QAF                  | _0_        | &#x066F; Dotless Qaf                          |\n| | | | | |                                                                                                                      \n|`U+0670`   | Mark [Mn]        | TRANSPARENT  | _null_               | 35         | &#x0670; Superscript Alef                     |\n|`U+0671`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0671; Alef With Wasla Above                |\n|`U+0672`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0672; Alef With Wavy Hamza Above           |\n|`U+0673`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0673; Alef With Wavy Hamza Below           |\n|`U+0674`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x0674; High Hamza                           |\n|`U+0675`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0675; High Hamza Alef                      |\n|`U+0676`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0676; High Hamza Waw                       |\n|`U+0677`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0677; High Hamza Waw With Damma Above      |\n|`U+0678`   | Letter           | DUAL         | YEH                  | _0_        | &#x0678; High Hamza Dotless Yeh               |\n|`U+0679`   | Letter           | DUAL         | BEH                  | _0_        | &#x0679; Dotless Beh With Tah Above           |\n|`U+067A`   | Letter           | DUAL         | BEH                  | _0_        | &#x067A; Dotless Beh With Vertical 2 Dots Above|\n|`U+067B`   | Letter           | DUAL         | BEH                  | _0_        | &#x067B; Dotless Beh With Vertical 2 Dots Below|\n|`U+067C`   | Letter           | DUAL         | BEH                  | _0_        | &#x067C; Dotless Beh With Attached Ring Below And 2 Dots Above|\n|`U+067D`   | Letter           | DUAL         | BEH                  | _0_        | &#x067D; Dotless Beh With Inverted 3 Dots Above|\n|`U+067E`   | Letter           | DUAL         | BEH                  | _0_        | &#x067E; Dotless Beh With 3 Dots Below        |\n|`U+067F`   | Letter           | DUAL         | BEH                  | _0_        | &#x067F; Dotless Beh With 4 Dots Above        |\n| | | | | |                                                                                                                      \n|`U+0680`   | Letter           | DUAL         | BEH                  | _0_        | &#x0680; Dotless Beh With 4 Dots Below        |\n|`U+0681`   | Letter           | DUAL         | HAH                  | _0_        | &#x0681; Hah With Hamza Above                 |\n|`U+0682`   | Letter           | DUAL         | HAH                  | _0_        | &#x0682; Hah With Vertical 2 Dots Above       |\n|`U+0683`   | Letter           | DUAL         | HAH                  | _0_        | &#x0683; Hah With 2 Dots Below                |\n|`U+0684`   | Letter           | DUAL         | HAH                  | _0_        | &#x0684; Hah With Vertical 2 Dots Below       |\n|`U+0685`   | Letter           | DUAL         | HAH                  | _0_        | &#x0685; Hah With 3 Dots Above                |\n|`U+0686`   | Letter           | DUAL         | HAH                  | _0_        | &#x0686; Hah With 3 Dots Below                |\n|`U+0687`   | Letter           | DUAL         | HAH                  | _0_        | &#x0687; Hah With 4 Dots Below                |\n|`U+0688`   | Letter           | RIGHT        | DAL                  | _0_        | &#x0688; Dal With Tah Above                   |\n|`U+0689`   | Letter           | RIGHT        | DAL                  | _0_        | &#x0689; Dal With Attached Ring Below         |\n|`U+068A`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068A; Dal With Dot Below                   |\n|`U+068B`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068B; Dal With Dot Below And Tah Above     |\n|`U+068C`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068C; Dal With 2 Dots Above                |\n|`U+068D`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068D; Dal With 2 Dots Below                |\n|`U+068E`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068E; Dal With 3 Dots Above                |\n|`U+068F`   | Letter           | RIGHT        | DAL                  | _0_        | &#x068F; Dal With Inverted 3 Dots Above       |\n| | | | | |                                                                                                                      \n|`U+0690`   | Letter           | RIGHT        | DAL                  | _0_        | &#x0690; Dal With 4 Dots Above                |\n|`U+0691`   | Letter           | RIGHT        | REH                  | _0_        | &#x0691; Reh With Tah Above                   |\n|`U+0692`   | Letter           | RIGHT        | REH                  | _0_        | &#x0692; Reh With V Above                     |\n|`U+0693`   | Letter           | RIGHT        | REH                  | _0_        | &#x0693; Reh With Attached Ring Below         |\n|`U+0694`   | Letter           | RIGHT        | REH                  | _0_        | &#x0694; Reh With Dot Below                   |\n|`U+0695`   | Letter           | RIGHT        | REH                  | _0_        | &#x0695; Reh With V Below                     |\n|`U+0696`   | Letter           | RIGHT        | REH                  | _0_        | &#x0696; Reh With Dot Below And Dot Within    |\n|`U+0697`   | Letter           | RIGHT        | REH                  | _0_        | &#x0697; Reh With 2 Dots Above                |\n|`U+0698`   | Letter           | RIGHT        | REH                  | _0_        | &#x0698; Reh With 3 Dots Above                |\n|`U+0699`   | Letter           | RIGHT        | REH                  | _0_        | &#x0699; Reh With 4 Dots Above                |\n|`U+069A`   | Letter           | DUAL         | SEEN                 | _0_        | &#x069A; Seen With Dot Below And Dot Above    |\n|`U+069B`   | Letter           | DUAL         | SEEN                 | _0_        | &#x069B; Seen With 3 Dots Below               |\n|`U+069C`   | Letter           | DUAL         | SEEN                 | _0_        | &#x069C; Seen With 3 Dots Below And 3 Dots Above|\n|`U+069D`   | Letter           | DUAL         | SAD                  | _0_        | &#x069D; Sad With 2 Dots Below                |\n|`U+069E`   | Letter           | DUAL         | SAD                  | _0_        | &#x069E; Sad With 3 Dots Above                |\n|`U+069F`   | Letter           | DUAL         | TAH                  | _0_        | &#x069F; Tah With 3 Dots Above                |\n| | | | | |                                                                                                                      \n|`U+06A0`   | Letter           | DUAL         | AIN                  | _0_        | &#x06A0; Ain With 3 Dots Above                |\n|`U+06A1`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A1; Dotless Feh                          |\n|`U+06A2`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A2; Dotless Feh With Dot Below           |\n|`U+06A3`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A3; Feh With Dot Below                   |\n|`U+06A4`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A4; Dotless Feh With 3 Dots Above        |\n|`U+06A5`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A5; Dotless Feh With 3 Dots Below        |\n|`U+06A6`   | Letter           | DUAL         | FEH                  | _0_        | &#x06A6; Dotless Feh With 4 Dots Above        |\n|`U+06A7`   | Letter           | DUAL         | QAF                  | _0_        | &#x06A7; Dotless Qaf With Dot Above           |\n|`U+06A8`   | Letter           | DUAL         | QAF                  | _0_        | &#x06A8; Dotless Qaf With 3 Dots Above        |\n|`U+06A9`   | Letter           | DUAL         | GAF                  | _0_        | &#x06A9; Keheh                                |\n|`U+06AA`   | Letter           | DUAL         | SWASH_KAF            | _0_        | &#x06AA; Swash Kaf                            |\n|`U+06AB`   | Letter           | DUAL         | GAF                  | _0_        | &#x06AB; Keheh With Attached Ring Below       |\n|`U+06AC`   | Letter           | DUAL         | KAF                  | _0_        | &#x06AC; Kaf With Dot Above                   |\n|`U+06AD`   | Letter           | DUAL         | KAF                  | _0_        | &#x06AD; Kaf With 3 Dots Above                |\n|`U+06AE`   | Letter           | DUAL         | KAF                  | _0_        | &#x06AE; Kaf With 3 Dots Below                |\n|`U+06AF`   | Letter           | DUAL         | GAF                  | _0_        | &#x06AF; Gaf                                  |\n| | | | | |                                                                                                                     \n|`U+06B0`   | Letter           | DUAL         | GAF                  | _0_        | &#x06B0; Gaf With Attached Ring Below         |\n|`U+06B1`   | Letter           | DUAL         | GAF                  | _0_        | &#x06B1; Gaf With 2 Dots Above                |\n|`U+06B2`   | Letter           | DUAL         | GAF                  | _0_        | &#x06B2; Gaf With 2 Dots Below                |\n|`U+06B3`   | Letter           | DUAL         | GAF                  | _0_        | &#x06B3; Gaf With Vertical 2 Dots Below       |\n|`U+06B4`   | Letter           | DUAL         | GAF                  | _0_        | &#x06B4; Gaf With 3 Dots Above                |\n|`U+06B5`   | Letter           | DUAL         | LAM                  | _0_        | &#x06B5; Lam With V Above                     |\n|`U+06B6`   | Letter           | DUAL         | LAM                  | _0_        | &#x06B6; Lam With Dot Above                   |\n|`U+06B7`   | Letter           | DUAL         | LAM                  | _0_        | &#x06B7; Lam With 3 Dots Above                |\n|`U+06B8`   | Letter           | DUAL         | LAM                  | _0_        | &#x06B8; Lam With 3 Dots Below                |\n|`U+06B9`   | Letter           | DUAL         | NOON                 | _0_        | &#x06B9; Noon With Dot Below                  |\n|`U+06BA`   | Letter           | DUAL         | NOON                 | _0_        | &#x06BA; Dotless Noon                         |\n|`U+06BB`   | Letter           | DUAL         | NOON                 | _0_        | &#x06BB; Dotless Noon With Tah Above          |\n|`U+06BC`   | Letter           | DUAL         | NOON                 | _0_        | &#x06BC; Noon With Attached Ring Below        |\n|`U+06BD`   | Letter           | DUAL         | NYA                  | _0_        | &#x06BD; Nya                                  |\n|`U+06BE`   | Letter           | DUAL         | KNOTTED_HEH          | _0_        | &#x06BE; Knotted Heh                          |\n|`U+06BF`   | Letter           | DUAL         | HAH                  | _0_        | &#x06BF; Hah With 3 Dots Below And Dot Above  |\n| | | | | |                                                                                                                      \n|`U+06C0`   | Letter           | RIGHT        | TEH_MARBUTA          | _0_        | &#x06C0; Dotless Teh Marbuta With Hamza Above |\n|`U+06C1`   | Letter           | DUAL         | HEH_GOAL             | _0_        | &#x06C1; Heh Goal                             |\n|`U+06C2`   | Letter           | DUAL         | HEH_GOAL             | _0_        | &#x06C2; Heh Goal With Hamza Above            |\n|`U+06C3`   | Letter           | RIGHT        | TEH_MARBUTA_GOAL     | _0_        | &#x06C3; Teh Marbuta Goal                     |\n|`U+06C4`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C4; Waw With Attached Ring Within        |\n|`U+06C5`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C5; Waw With Bar                         |\n|`U+06C6`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C6; Waw With V Above                     |\n|`U+06C7`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C7; Waw With Damma Above                 |\n|`U+06C8`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C8; Waw With Alef Above                  |\n|`U+06C9`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06C9; Waw With Inverted V Above            |\n|`U+06CA`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06CA; Waw With 2 Dots Above                |\n|`U+06CB`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06CB; Waw With 3 Dots Above                |\n|`U+06CC`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x06CC; Farsi Yeh                            |\n|`U+06CD`   | Letter           | RIGHT        | YEH_WITH_TAIL        | _0_        | &#x06CD; Yeh With Tail                        |\n|`U+06CE`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x06CE; Farsi Yeh With V Above               |\n|`U+06CF`   | Letter           | RIGHT        | WAW                  | _0_        | &#x06CF; Waw With Dot Above                   |\n| | | | | |                                                                                                                      \n|`U+06D0`   | Letter           | DUAL         | YEH                  | _0_        | &#x06D0; Dotless Yeh With Vertical 2 Dots Below|\n|`U+06D1`   | Letter           | DUAL         | YEH                  | _0_        | &#x06D1; Dotless Yeh With 3 Dots Below        |\n|`U+06D2`   | Letter           | RIGHT        | YEH_BARREE           | _0_        | &#x06D2; Yeh Barree                           |\n|`U+06D3`   | Letter           | RIGHT        | YEH_BARREE           | _0_        | &#x06D3; Yeh Barree With Hamza Above          |\n|`U+06D4`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x06D4; Full Stop                            |\n|`U+06D5`   | Letter           | NON_JOINING  | TEH_MARBUTA          | _0_        | &#x06D5; Dotless Teh Marbuta                  |\n|`U+06D6`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06D6; Small High Sad Lam Alef Maksura      |\n|`U+06D7`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06D7; Small High Qaf Lam Alef Maksura      |\n|`U+06D8`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06D8; Small High Meem Initial Form         |\n|`U+06D9`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06D9; Small High Lam Alef                  |\n|`U+06DA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06DA; Small High Jeem                      |\n|`U+06DB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06DB; Small High Three Dots                |\n|`U+06DC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x06DC; Small High Seen                      |\n|`U+06DD`   | Other            | NON_JOINING  | _null_               | _0_        | &#x06DD; End Of Ayah                          |\n|`U+06DE`   | Other            | NON_JOINING  | _null_               | _0_        | &#x06DE; Start Of Rub El Hizb                 |\n|`U+06DF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06DF; Small High Rounded Zero              |\n| | | | | |                                                                                                                      \n|`U+06E0`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06E0; Small High Upright Rectangular Zero  |\n|`U+06E1`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06E1; Small High Dotless Head Of Khah      |\n|`U+06E2`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06E2; Small High Meem Isolated Form        |\n|`U+06E3`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220_MCM    | &#x06E3; Small Low Seen                       |\n|`U+06E4`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06E4; Small High Madda                     |\n|`U+06E5`   | Letter modifier  | NON_JOINING  | _null_               | _0_        | &#x06E5; Small Waw                            |\n|`U+06E6`   | Letter modifier  | NON_JOINING  | _null_               | _0_        | &#x06E6; Small Yeh                            |\n|`U+06E7`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x06E7; Small High Yeh                       |\n|`U+06E8`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x06E8; Small High Noon                      |\n|`U+06E9`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x06E9; Place Of Sajdah                      |\n|`U+06EA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x06EA; Empty Centre Low Stop                |\n|`U+06EB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06EB; Empty Centre High Stop               |\n|`U+06EC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x06EC; Rounded High Stop With Filled Centre |\n|`U+06ED`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x06ED; Small Low Meem                       |\n|`U+06EE`   | Letter           | RIGHT        | DAL                  | _0_        | &#x06EE; Dal With Inverted V Above            |\n|`U+06EF`   | Letter           | RIGHT        | REH                  | _0_        | &#x06EF; Reh With Inverted V Above            |\n| | | | | |                                                                                                                      \n|`U+06F0`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F0; Extended Digit Zero                  |\n|`U+06F1`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F1; Extended Digit One                   |\n|`U+06F2`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F2; Extended Digit Two                   |\n|`U+06F3`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F3; Extended Digit Three                 |\n|`U+06F4`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F4; Extended Digit Four                  |\n|`U+06F5`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F5; Extended Digit Five                  |\n|`U+06F6`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F6; Extended Digit Six                   |\n|`U+06F7`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F7; Extended Digit Seven                 |\n|`U+06F8`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F8; Extended Digit Eight                 |\n|`U+06F9`   | Number           | NON_JOINING  | _null_               | _0_        | &#x06F9; Extended Digit Nine                  |\n|`U+06FA`   | Letter           | DUAL         | SEEN                 | _0_        | &#x06FA; Sheen With Dot Below                 |\n|`U+06FB`   | Letter           | DUAL         | SAD                  | _0_        | &#x06FB; Dad With Dot Below                   |\n|`U+06FC`   | Letter           | DUAL         | AIN                  | _0_        | &#x06FC; Ghain With Dot Below                 |\n|`U+06FD`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x06FD; Sign Sindhi Ampersand                |\n|`U+06FE`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x06FE; Sign Sindhi Postposition Men         |\n|`U+06FF`   | Letter           | DUAL         | KNOTTED_HEH          | _0_        | &#x06FF; Knotted Heh With Inverted V Above    |          \n:::\n\n\n\n## Arabic Supplement character table ##\n\n\n:::{table} Arabic Supplement block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                                           |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------------------------|\n|`U+0750`   | Letter           | DUAL         | BEH                  | _0_        | &#x0750; Dotless Beh With Horizontal 3 Dots Below               |\n|`U+0751`   | Letter           | DUAL         | BEH                  | _0_        | &#x0751; Beh With 3 Dots Above                                  |\n|`U+0752`   | Letter           | DUAL         | BEH                  | _0_        | &#x0752; Dotless Beh With Inverted 3 Dots Below                 |\n|`U+0753`   | Letter           | DUAL         | BEH                  | _0_        | &#x0753; Dotless Beh With Inverted 3 Dots Below And 2 Dots Above|\n|`U+0754`   | Letter           | DUAL         | BEH                  | _0_        | &#x0754; Dotless Beh With 2 Dots Below And Dot Above            |\n|`U+0755`   | Letter           | DUAL         | BEH                  | _0_        | &#x0755; Dotless Beh With Inverted V Below                      |\n|`U+0756`   | Letter           | DUAL         | BEH                  | _0_        | &#x0756; Dotless Beh With V Above                               |\n|`U+0757`   | Letter           | DUAL         | HAH                  | _0_        | &#x0757; Hah With 2 Dots Above                                  |\n|`U+0758`   | Letter           | DUAL         | HAH                  | _0_        | &#x0758; Hah With Inverted 3 Dots Below                         |\n|`U+0759`   | Letter           | RIGHT        | DAL                  | _0_        | &#x0759; Dal With Vertical 2 Dots Below And Tah Above           |\n|`U+075A`   | Letter           | RIGHT        | DAL                  | _0_        | &#x075A; Dal With Inverted V Below                              |\n|`U+075B`   | Letter           | RIGHT        | REH                  | _0_        | &#x075B; Reh With Bar                                           |\n|`U+075C`   | Letter           | DUAL         | SEEN                 | _0_        | &#x075C; Seen With 4 Dots Above                                 |\n|`U+075D`   | Letter           | DUAL         | AIN                  | _0_        | &#x075D; Ain With 2 Dots Above                                  |\n|`U+075E`   | Letter           | DUAL         | AIN                  | _0_        | &#x075E; Ain With Inverted 3 Dots Above                         |\n|`U+075F`   | Letter           | DUAL         | AIN                  | _0_        | &#x075F; Ain With Vertical 2 Dots Above                         |\n| | | | | |                                                                                                              \n|`U+0760`   | Letter           | DUAL         | FEH                  | _0_        | &#x0760; Dotless Feh With 2 Dots Below                          |\n|`U+0761`   | Letter           | DUAL         | FEH                  | _0_        | &#x0761; Dotless Feh With Inverted 3 Dots Below                 |\n|`U+0762`   | Letter           | DUAL         | GAF                  | _0_        | &#x0762; Keheh With Dot Above                                   |\n|`U+0763`   | Letter           | DUAL         | GAF                  | _0_        | &#x0763; Keheh With 3 Dots Above                                |\n|`U+0764`   | Letter           | DUAL         | GAF                  | _0_        | &#x0764; Keheh With Inverted 3 Dots Below                       |\n|`U+0765`   | Letter           | DUAL         | MEEM                 | _0_        | &#x0765; Meem With Dot Above                                    |\n|`U+0766`   | Letter           | DUAL         | MEEM                 | _0_        | &#x0766; Meem With Dot Below                                    |\n|`U+0767`   | Letter           | DUAL         | NOON                 | _0_        | &#x0767; Noon With 2 Dots Below                                 |\n|`U+0768`   | Letter           | DUAL         | NOON                 | _0_        | &#x0768; Noon With Tah Above                                    |\n|`U+0769`   | Letter           | DUAL         | NOON                 | _0_        | &#x0769; Noon With V Above                                      |\n|`U+076A`   | Letter           | DUAL         | LAM                  | _0_        | &#x076A; Lam With Bar                                           |\n|`U+076B`   | Letter           | RIGHT        | REH                  | _0_        | &#x076B; Reh With Vertical 2 Dots Above                         |\n|`U+076C`   | Letter           | RIGHT        | REH                  | _0_        | &#x076C; Reh With Hamza Above                                   |\n|`U+076D`   | Letter           | DUAL         | SEEN                 | _0_        | &#x076D; Seen With Vertical 2 Dots Above                        |\n|`U+076E`   | Letter           | DUAL         | HAH                  | _0_        | &#x076E; Hah With Tah Below                                     |\n|`U+076F`   | Letter           | DUAL         | HAH                  | _0_        | &#x076F; Hah With Tah And 2 Dots Below                          |\n| | | | | |                                                                                                                      \n|`U+0770`   | Letter           | DUAL         | SEEN                 | _0_        | &#x0770; Seen With 2 Dots And Tah Above                         |\n|`U+0771`   | Letter           | RIGHT        | REH                  | _0_        | &#x0771; Reh With 2 Dots And Tah Above                          |\n|`U+0772`   | Letter           | DUAL         | HAH                  | _0_        | &#x0772; Hah With Tah Above                                     |\n|`U+0773`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0773; Alef With Digit Two Above                              |\n|`U+0774`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0774; Alef With Digit Three Above                            |\n|`U+0775`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x0775; Farsi Yeh With Digit Two Above                         |\n|`U+0776`   | Letter           | DUAL         | FARSI_YEH            | _0_        | &#x0776; Farsi Yeh With Digit Three Above                       |\n|`U+0777`   | Letter           | DUAL         | YEH                  | _0_        | &#x0777; Dotless Yeh With Digit Four Below                      |\n|`U+0778`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0778; Waw With Digit Two Above                               |\n|`U+0779`   | Letter           | RIGHT        | WAW                  | _0_        | &#x0779; Waw With Digit Three Above                             |\n|`U+077A`   | Letter           | DUAL         | BURUSHASKI_YEH_BARREE| _0_        | &#x077A; Burushaski Yeh Barree With Digit Two Above             |\n|`U+077B`   | Letter           | DUAL         | BURUSHASKI_YEH_BARREE| _0_        | &#x077B; Burushaski Yeh Barree With Digit Three Above           |\n|`U+077C`   | Letter           | DUAL         | HAH                  | _0_        | &#x077C; Hah With Digit Four Below                              |\n|`U+077D`   | Letter           | DUAL         | SEEN                 | _0_        | &#x077D; Seen With Digit Four Above                             |\n|`U+077E`   | Letter           | DUAL         | SEEN                 | _0_        | &#x077E; Seen With Inverted V Above                             |\n|`U+077F`   | Letter           | DUAL         | KAF                  | _0_        | &#x077F; Kaf With 2 Dots Above                                  |                        \n:::\n\n\n## Arabic Extended-A character table ##\n\n\n:::{table} Arabic Extended-A block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                                 |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|\n|`U+08A0`   | Letter           | DUAL         | BEH                  | _0_        | &#x08A0; Dotless Beh With V Below                     |\n|`U+08A1`   | Letter           | DUAL         | BEH                  | _0_        | &#x08A1; Beh With Hamza Above                         |\n|`U+08A2`   | Letter           | DUAL         | HAH                  | _0_        | &#x08A2; Hah With Dot Below And 2 Dots Above          |\n|`U+08A3`   | Letter           | DUAL         | TAH                  | _0_        | &#x08A3; Tah With 2 Dots Above                        |\n|`U+08A4`   | Letter           | DUAL         | FEH                  | _0_        | &#x08A4; Dotless Feh With Dot Below And 3 Dots Above  |\n|`U+08A5`   | Letter           | DUAL         | QAF                  | _0_        | &#x08A5; Qaf With Dot Below                           |\n|`U+08A6`   | Letter           | DUAL         | LAM                  | _0_        | &#x08A6; Lam With Double Bar                          |\n|`U+08A7`   | Letter           | DUAL         | MEEM                 | _0_        | &#x08A7; Meem With 3 Dots Above                       |\n|`U+08A8`   | Letter           | DUAL         | YEH                  | _0_        | &#x08A8; Yeh With Hamza Above                         |\n|`U+08A9`   | Letter           | DUAL         | YEH                  | _0_        | &#x08A9; Yeh With Dot Above                           |\n|`U+08AA`   | Letter           | RIGHT        | REH                  | _0_        | &#x08AA; Reh With Loop                                |\n|`U+08AB`   | Letter           | RIGHT        | WAW                  | _0_        | &#x08AB; Waw With Dot Within                          |\n|`U+08AC`   | Letter           | RIGHT        | ROHINGYA_YEH         | _0_        | &#x08AC; Rohingya Yeh                                 |\n|`U+08AD`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x08AD; Low Alef                                     |\n|`U+08AE`   | Letter           | RIGHT        | DAL                  | _0_        | &#x08AE; Dal With 3 Dots Below                        |\n|`U+08AF`   | Letter           | DUAL         | SAD                  | _0_        | &#x08AF; Sad With 3 Dots Below                        |\n| | | | | |                                                                                                              \n|`U+08B0`   | Letter           | DUAL         | GAF                  | _0_        | &#x08B0; Keheh With Stroke Below                      |\n|`U+08B1`   | Letter           | RIGHT        | STRAIGHT_WAW         | _0_        | &#x08B1; Straight Waw                                 |\n|`U+08B2`   | Letter           | RIGHT        | REH                  | _0_        | &#x08B2; Reh With Dot And Inverted V Above            |\n|`U+08B3`   | Letter           | DUAL         | AIN                  | _0_        | &#x08B3; Ain With 3 Dots Below                        |\n|`U+08B4`   | Letter           | DUAL         | KAF                  | _0_        | &#x08B4; Kaf With Dot Below                           |\n|`U+08B5`   | Letter           | DUAL         | QAF                  | _0_        | &#x08B5; Qaf With Dot Below                           |\n|`U+08B6`   | Letter           | DUAL         | BEH                  | _0_        | &#x08B6; Beh With Meem Above                          |\n|`U+08B7`   | Letter           | DUAL         | BEH                  | _0_        | &#x08B7; Dotless Beh With 3 Dots Below And Meem Above |\n|`U+08B8`   | Letter           | DUAL         | BEH                  | _0_        | &#x08B8; Dotless Beh With Teh Above                   |\n|`U+08B9`   | Letter           | RIGHT        | REH                  | _0_        | &#x08B9; Reh With Noon Above                          |\n|`U+08BA`   | Letter           | DUAL         | YEH                  | _0_        | &#x08BA; Yeh With Noon Above                          |\n|`U+08BB`   | Letter           | DUAL         | AFRICAN_FEH          | _0_        | &#x08BB; African Feh                                  |\n|`U+08BC`   | Letter           | DUAL         | AFRICAN_QAF          | _0_        | &#x08BC; African Qaf                                  |\n|`U+08BD`   | Letter           | DUAL         | AFRICAN_NOON         | _0_        | &#x08BD; African Noon                                 |\n|`U+08BE`   | Letter           | DUAL         | BEH                  | _0_        | &#x08BE; Peh With Small V                             |\n|`U+08BF`   | Letter           | DUAL         | BEH                  | _0_        | &#x08BF; Teh With Small V                             |\n| | | | | |\n|`U+08C0`   | Letter           | DUAL         | BEH                  | _0_        | &#x08C0; Tteh With Small V                            |\n|`U+08C1`   | Letter           | DUAL         | HAH                  | _0_        | &#x08C1; Tcheh With Small V                           |\n|`U+08C2`   | Letter           | DUAL         | GAF                  | _0_        | &#x08C2; Keheh With Small V                           |\n|`U+08C3`   | Letter           | DUAL         | AIN                  | _0_        | &#x08C3; Ghain With 3 Dots Above                      |\n|`U+08C4`   | Letter           | DUAL         | AFRICAN_QAF          | _0_        | &#x08C4; African Qaf With 3 Dots Above                |\n|`U+08C5`   | Letter           | DUAL         | HAH                  | _0_        | &#x08C5; Jeem With 3 Dots Above                       |\n|`U+08C6`   | Letter           | DUAL         | HAH                  | _0_        | &#x08C6; Jeem With 3 Dots Below                       |\n|`U+08C7`   | Letter           | DUAL         | LAM                  | _0_        | &#x08C7; Lam With Small Arabic Tah Above              |\n|`U+08C8`   | Letter           | DUAL         | GAF                  | _0_        | &#x08C8; Graf                                         |\n|`U+08C9`   | Letter modifier  | TRANSPARENT  | _null_               | _0_        | &#x08C9; Small Farsi Yeh                              |\n|`U+08CA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x08CA; Small High Farsi Yeh                         |\n|`U+08CB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x08CB; Small High Yeh Barree With Two Dots Below    |\n|`U+08CC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08CC; Small High Word Sah                          |\n|`U+08CD`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x08CD; Small High Zah                               |\n|`U+08CE`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x08CE; Large Round Dot Above                        |\n|`U+08CF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220_MCM    | &#x08CF; Large Round Dot Below                        |\n| | | | | |\n|`U+08D0`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08D0; Sukun Below                                  |\n|`U+08D1`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08D1; Large Circle Below                           |\n|`U+08D2`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08D2; Large Round Dot Inside Circle Below          |\n|`U+08D3`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220_MCM    | &#x08D3; Small Low Waw                                |\n|`U+08D4`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D4; Small High Word Ar-Rub                       |\n|`U+08D5`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D5; Small High Sad                               |\n|`U+08D6`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D6; Small High Ain                               |\n|`U+08D7`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D7; Small High Qaf                               |\n|`U+08D8`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D8; Small High Noon With Kasra                   |\n|`U+08D9`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08D9; Small Low Noon With Kasra                    |\n|`U+08DA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DA; Small High Word Ath-Thalatha                 |\n|`U+08DB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DB; Small High Word As-Sajda                     |\n|`U+08DC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DC; Small High Word An-Nisf                      |\n|`U+08DD`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DD; Small High Word Sakta                        |\n|`U+08DE`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DE; Small High Word Qif                          |\n|`U+08DF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08DF; Small High Word Waqfa                        |\n| | | | | | \n|`U+08E0`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E0; Small High Footnote Marker                   |\n|`U+08E1`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E1; Small High Sign Safha                        |\n|`U+08E2`   | Other            | NON_JOINING  | _null_               | _0_        | &#x08E2; Disputed End Of Ayah                         |\n|`U+08E3`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08E3; Turned Damma Below                           |\n|`U+08E4`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E4; Curly Fatha                                  |\n|`U+08E5`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E5; Curly Damma                                  |\n|`U+08E6`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08E6; Curly Kasra                                  |\n|`U+08E7`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E7; Curly Fathatan                               |\n|`U+08E8`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08E8; Curly Dammatan                               |\n|`U+08E9`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08E9; Curly Kasratan                               |\n|`U+08EA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08EA; Tone One Dot Above                           |\n|`U+08EB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08EB; Tone Two Dots aAove                          |\n|`U+08EC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08EC; Tone Loop Above                              |\n|`U+08ED`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08ED; Tone One Dot Below                           |\n|`U+08EE`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08EE; Tone Two Dots Below                          |\n|`U+08EF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08EF; Tone Loop Below                              |\n| | | | | |                                                                                                                      \n|`U+08F0`   | Mark [Mn]        | TRANSPARENT  | _null_               | 27         | &#x08F0; Open Fathatan                                |\n|`U+08F1`   | Mark [Mn]        | TRANSPARENT  | _null_               | 28         | &#x08F1; Open Dammatan                                |\n|`U+08F2`   | Mark [Mn]        | TRANSPARENT  | _null_               | 29         | &#x08F2; Open Kasratan                                |\n|`U+08F3`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230_MCM    | &#x08F3; Small High Waw                               |\n|`U+08F4`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08F4; Fatha With Ring                              |\n|`U+08F5`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08F5; Fatha With Dot Above                         |\n|`U+08F6`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08F6; Kasra With Dot Below                         |\n|`U+08F7`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08F7; Left Arrowhead Above                         |\n|`U+08F8`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08F8; Right Arrowhead Above                        |\n|`U+08F9`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08F9; Left Arrowhead Below                         |\n|`U+08FA`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x08FA; Right Arrowhead Below                        |\n|`U+08FB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08FB; Double Right Arrowhead Above                 |\n|`U+08FC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08FC; Double Right Arrowhead Above With Dot        |\n|`U+08FD`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08FD; Right Arrowhead Above With Dot               |\n|`U+08FE`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08FE; Damma With Dot                               |\n|`U+08FF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x08FF; Mark Sideways Noon Ghunna                    |          \n:::\n\n\n## Arabic Extended-B character table ##\n\n\n:::{table} Arabic Extended-B block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                                 |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|\n|`U+0870`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0870; Alef With Attached Fatha                     |\n|`U+0871`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0871; Alef With Attached Top Right Fatha           |\n|`U+0872`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0872; Alef With Right Middle Stroke                |\n|`U+0873`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0873; Alef With Left Middle Stroke                 |\n|`U+0874`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0874; Alef With Attached Kasra                     |\n|`U+0875`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0875; Alef With Attached Bottom Right Kasra        |\n|`U+0876`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0876; Alef With Attached Round Dot Above           |\n|`U+0877`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0877; Alef With Attached Right Round Dot           |\n|`U+0878`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0878; Alef With Attached Left Round Dot            |\n|`U+0879`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0879; Alef With Attached Round Dot Below           |\n|`U+087A`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087A; Alef With Dot Above                          |\n|`U+087B`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087B; Alef With Attached Top Right Fatha And Dot Above|\n|`U+087C`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087C; Alef With Right Middle Stroke And Dot Above  |\n|`U+087D`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087D; Alef With Attached Bottom Right Kasra And Dot Above|\n|`U+087E`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087E; Alef With Attached Top Right Fatha And Left Ring|\n|`U+087F`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x087F; Alef With Right Middle Stroke And Left Ring  |\n| | | | | |\n|`U+0880`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0880; Alef With Attached Bottom Right Kasra And Left Ring|\n|`U+0881`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0881; Alef With Attached Right Hamza               |\n|`U+0882`   | Letter           | RIGHT        | ALEF                 | _0_        | &#x0882; Alef With Attached Left Hamza                |\n|`U+0883`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x0883; Tatweel With Overstruck Hamza                |\n|`U+0884`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x0884; Tatweel With Overstruck Waw                  |\n|`U+0885`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x0885; Tatweel With Two Dots Below                  |\n|`U+0886`   | Letter           | DUAL         | THIN_YEH             | _0_        | &#x0886; Thin Yeh                                     |\n|`U+0887`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x0887; Baseline Round Dot                           |\n|`U+0888`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0888; Raised Round Dot                             |\n|`U+0889`   | Letter           | DUAL         | NOON                 | _0_        | &#x0889; Noon With Inverted Small V                   |\n|`U+088A`   | Letter           | DUAL         | HAH                  | _0_        | &#x088A; Hah With Inverted Small V Below              |\n|`U+088B`   | Letter           | DUAL         | TAH                  | _0_        | &#x088B; Tah With Dot Below                           |\n|`U+088C`   | Letter           | DUAL         | TAH                  | _0_        | &#x088C; Tah With Three Dots Below                    |\n|`U+088D`   | Letter           | DUAL         | GAF                  | _0_        | &#x088D; Keheh With Two Dots Vertically Below         |\n|`U+088E`   | Letter           | RIGHT        | VERTICAL_TAIL        | _0_        | &#x088E; Vertical Tail                                |\n|`U+088F`   | _unassigned_     |              |                      |            |                                                       |\n| | | | | |\n|`U+0890`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0890; Pound Mark Above                             |\n|`U+0891`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x0891; Piastre Mark Above                           |\n|`U+0892`   | _unassigned_     |              |                      |            |                                                       |\n|`U+0893`   | _unassigned_     |              |                      |            |                                                       |\n|`U+0894`   | _unassigned_     |              |                      |            |                                                       |\n|`U+0895`   | _unassigned_     |              |                      |            |                                                       |\n|`U+0896`   | _unassigned_     |              |                      |            |                                                       |\n|`U+0897`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0897; Pepet                                        |\n|`U+0898`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0898; Small High Word Al-Juz                       |\n|`U+0899`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0899; Small Low Word Ishmaam                       |\n|`U+089A`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x089A; Small Low Word Imaala                        |\n|`U+089B`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x089B; Small Low Word Tasheel                       |\n|`U+089C`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x089C; Madda Waajib                                 |\n|`U+089D`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x089D; Superscript Alef Mokhassas                   |\n|`U+089E`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x089E; Doubled Madda                                |\n|`U+089F`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x089F; Half Madda Over Madda                        |\n| | | | | |\n:::\n\n\n## Arabic Extended-C character table ##\n\n\n:::{table} Arabic Extended-C block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                                 |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|\n|`U+10EC0`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC1`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC2`  | Letter           | RIGHT        | DAL                  | _0_        | &#x10EC2; Dal With Two Dots Vertically Below          |\n|`U+10EC3`  | Letter           | DUAL         | TAH                  | _0_        | &#x10EC2; Tah With Two Dots Vertically Below          |\n|`U+10EC4`  | Letter           | DUAL         | KAF                  | _0_        | &#x10EC2; Kaf With Two Dots Vertically Below          |\n|`U+10EC5`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC6`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC7`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC8`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EC9`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECA`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECB`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECC`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECD`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECE`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ECF`  | _unassigned_     |              |                      |            |                                                       |\n| | | | | |\n|`U+10ED0`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED1`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED2`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED3`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED4`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED5`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED6`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED7`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED8`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10ED9`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDA`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDB`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDC`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDD`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDE`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EDF`  | _unassigned_     |              |                      |            |                                                       |\n| | | | | |\n|`U+10EE0`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE1`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE2`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE3`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE4`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE5`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE6`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE7`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE8`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EE9`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EEA`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EEB`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EEC`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EED`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EEE`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EEF`  | _unassigned_     |              |                      |            |                                                       |\n| | | | | |\n|`U+10EF0`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF1`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF2`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF3`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF4`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF5`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF6`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF7`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF8`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EF9`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EFA`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EFB`  | _unassigned_     |              |                      |            |                                                       |\n|`U+10EFC`  | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x10EFC; Combining Alef Overlay                      |\n|`U+10EFD`  | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x10EFD; Small Low Word Sakta                        |\n|`U+10EFE`  | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x10EFE; Small Low Word Qasr                         |\n|`U+10EFF`  | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x10EFF; Small Low Word Madda                        |\n| | | | | |\n:::\n\n\n## Rumi Numeral Symbols character table ##\n\n:::{table} Rumi Numeral Symbols block table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+10E60`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E60; Digit One            |\n|`U+10E61`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E61; Digit Two            |\n|`U+10E62`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E62; Digit Three          |\n|`U+10E63`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E63; Digit Four           |\n|`U+10E64`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E64; Digit Five           |\n|`U+10E65`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E65; Digit Six            |\n|`U+10E66`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E66; Digit Seven          |\n|`U+10E67`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E67; Digit Eight          |\n|`U+10E68`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E68; Digit Nine           |\n|`U+10E69`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E69; Number Ten           |\n|`U+10E6A`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6A; Number Twenty        |\n|`U+10E6B`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6B; Number Thirty        |\n|`U+10E6C`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6C; Number Forty         |\n|`U+10E6D`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6D; Number Fifty         |\n|`U+10E6E`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6E; Number Sixty         |\n|`U+10E6F`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6F; Number Seventy       |\n| | | | | |                                                                                          \n|`U+10E70`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E70; Number Eighty        |\n|`U+10E71`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E71; Number Ninety        |\n|`U+10E72`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E72; Number One Hundred   |\n|`U+10E73`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E73; Number Two Hundred   |\n|`U+10E74`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E74; Number Three Hundred |\n|`U+10E75`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E75; Number Four Hundred  |\n|`U+10E76`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E76; Number Five Hundred  |\n|`U+10E77`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E77; Number Six Hundred   |\n|`U+10E78`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E78; Number Seven Hundred |\n|`U+10E79`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E79; Number Eight Hundred |\n|`U+10E7A`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E7A; Number Nine Hundred  |\n|`U+10E7B`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E7B; Fraction One Half    |\n|`U+10E7C`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E7C; Fraction One Quarter |\n|`U+10E7D`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E7D; Fraction One Third   |\n|`U+10E7E`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E7E; Fraction Two Thirds  |\n|`U+10E7F`  | _unassigned_     |              |                      |            |                                |\n:::\n\n\n<!--- \n## Arabic Mathematical Alphabetic Symbols character table ##\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+10E60`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E60; Digit One            |\n|`U+10E61`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E61; Digit Two            |\n|`U+10E62`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E62; Digit Three          |\n|`U+10E63`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E63; Digit Four           |\n|`U+10E64`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E64; Digit Five           |\n|`U+10E65`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E65; Digit Six            |\n|`U+10E66`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E66; Digit Seven          |\n|`U+10E67`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E67; Digit Eight          |\n|`U+10E68`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E68; Digit Nine           |\n|`U+10E69`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E69; Number Ten           |\n|`U+10E6A`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6A; Number Twenty        |\n|`U+10E6B`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6B; Number Thirty        |\n|`U+10E6C`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6C; Number Forty         |\n|`U+10E6D`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6D; Number Fifty         |\n|`U+10E6E`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6E; Number Sixty         |\n|`U+10E6F`  | Number           | NON_JOINING  | _null_               | _0_        | &#x10E6F; Number Seventy       |\n| | | | | |                                                                                          \n\n--->\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Arabic text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text syllables may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+00A0`   | Separator        | NON_JOINING  | _null_               | _0_        | &#x00A0; No-break space        |\n|`U+034F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x034F; Combining grapheme joiner |\n|`U+200C`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOIN_CAUSING | _null_               | _0_        | &#x200D; Zero-width joiner     |\n|`U+200E`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200E; Left-to-Right marker  |\n|`U+200F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200F; Right-to-Left marker  |\n|`U+2010`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x25CC; Dotted circle         |\n:::\n\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n<!--- combining grapheme joiner explanation --->\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence \"_Letter_,ZWJ\"\nwould be used. To show the medial form of a letter in isolation, the\nsequence \"ZWJ,_Letter_,ZWJ\" would be used.\n\n\n<!--- Zero-Width Non Joiner explanation --->\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes.\n\n\n<!--- How shaping is affected by the <abbr title=\"Left-To-Right\">LTR</abbr> and <abbr title=\"Right-To-Left\">RTL</abbr> markers explanation --->\n\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and \"Hamza\") in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n"
  },
  {
    "path": "character-tables/character-tables-bengali.md",
    "content": "# Bengali character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Bengali text](../opentype-shaping-bengali.md).\n\n**Contents**\n\n  - [Bengali character table](#bengali-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Bengali character table ##\n\nBengali glyphs should be classified as in the following\ntable. Codepoints in the Bengali block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Bengali character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0980`   | Letter           | CONSONANT_PLACEHOLDER | _null_                 | &#x0980; Anji                |\n|`U+0981`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0981; Candrabindu         |\n|`U+0982`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0982; Anusvara            |\n|`U+0983`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0983; Visarga             |\n|`U+0984`   | _unassigned_     |                   |                            |                              |\n|`U+0985`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0985; A                   |\n|`U+0986`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0986; Aa                  |\n|`U+0987`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0987; I                   |\n|`U+0988`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0988; Ii                  |\n|`U+0989`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0989; U                   |\n|`U+098A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x098A; Uu                  |\n|`U+098B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x098B; Vocalic R           |\n|`U+098C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x098C; Vocalic L           |\n|`U+098D`   | _unassigned_     |                   |                            |                              |\n|`U+098E`   | _unassigned_     |                   |                            |                              |\n|`U+098F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x098F; E                   |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0990`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0990; Ai                  |\n|`U+0991`   | _unassigned_     |                   |                            |                              |\n|`U+0992`   | _unassigned_     |                   |                            |                              |\n|`U+0993`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0993; O                   |\n|`U+0994`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0994; Au                  |\n|`U+0995`   | Letter           | CONSONANT         | _null_                     | &#x0995; Ka                  |\n|`U+0996`   | Letter           | CONSONANT         | _null_                     | &#x0996; Kha                 |\n|`U+0997`   | Letter           | CONSONANT         | _null_                     | &#x0997; Ga                  |\n|`U+0998`   | Letter           | CONSONANT         | _null_                     | &#x0998; Gha                 |\n|`U+0999`   | Letter           | CONSONANT         | _null_                     | &#x0999; Nga                 |\n|`U+099A`   | Letter           | CONSONANT         | _null_                     | &#x099A; Ca                  |\n|`U+099B`   | Letter           | CONSONANT         | _null_                     | &#x099B; Cha                 |\n|`U+099C`   | Letter           | CONSONANT         | _null_                     | &#x099C; Ja                  |\n|`U+099D`   | Letter           | CONSONANT         | _null_                     | &#x099D; Jha                 |\n|`U+099E`   | Letter           | CONSONANT         | _null_                     | &#x099E; Nya                 |\n|`U+099F`   | Letter           | CONSONANT         | _null_                     | &#x099F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+09A0`   | Letter           | CONSONANT         | _null_                     | &#x09A0; Ttha                |\n|`U+09A1`   | Letter           | CONSONANT         | _null_                     | &#x09A1; Dda                 |\n|`U+09A2`   | Letter           | CONSONANT         | _null_                     | &#x09A2; Ddha                |\n|`U+09A3`   | Letter           | CONSONANT         | _null_                     | &#x09A3; Nna                 |\n|`U+09A4`   | Letter           | CONSONANT         | _null_                     | &#x09A4; Ta                  |\n|`U+09A5`   | Letter           | CONSONANT         | _null_                     | &#x09A5; Tha                 |\n|`U+09A6`   | Letter           | CONSONANT         | _null_                     | &#x09A6; Da                  |\n|`U+09A7`   | Letter           | CONSONANT         | _null_                     | &#x09A7; Dha                 |\n|`U+09A8`   | Letter           | CONSONANT         | _null_                     | &#x09A8; Na                  |\n|`U+09A9`   | _unassigned_     |                   |                            |                              |\n|`U+09AA`   | Letter           | CONSONANT         | _null_                     | &#x09AA; Pa                  |\n|`U+09AB`   | Letter           | CONSONANT         | _null_                     | &#x09AB; Pha                 |\n|`U+09AC`   | Letter           | CONSONANT         | _null_                     | &#x09AC; Ba                  |\n|`U+09AD`   | Letter           | CONSONANT         | _null_                     | &#x09AD; Bha                 |\n|`U+09AE`   | Letter           | CONSONANT         | _null_                     | &#x09AE; Ma                  |\n|`U+09AF`   | Letter           | CONSONANT         | _null_                     | &#x09AF; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t    \n|`U+09B0`   | Letter           | CONSONANT         | _null_                     | &#x09B0; Ra                  |\n|`U+09B1`   | _unassigned_     |                   |                            |                              |\n|`U+09B2`   | Letter           | CONSONANT         | _null_                     | &#x09B2; La                  |\n|`U+09B3`   | _unassigned_     |                   |                            |                              |\n|`U+09B4`   | _unassigned_     |                   |                            |                              |\n|`U+09B5`   | _unassigned_     |                   |                            |                              |\n|`U+09B6`   | Letter           | CONSONANT         | _null_                     | &#x09B6; Sha                 |\n|`U+09B7`   | Letter           | CONSONANT         | _null_                     | &#x09B7; Ssa                 |\n|`U+09B8`   | Letter           | CONSONANT         | _null_                     | &#x09B8; Sa                  |\n|`U+09B9`   | Letter           | CONSONANT         | _null_                     | &#x09B9; Ha                  |\n|`U+09BA`   | _unassigned_     |                   |                            |                              |\n|`U+09BB`   | _unassigned_     |                   |                            |                              |\n|`U+09BC`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x09BC; Nukta               |\n|`U+09BD`   | Letter           | AVAGRAHA          | _null_                     | &#x09BD; Avagraha            |\n|`U+09BE`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x09BE; Sign Aa             |\n|`U+09BF`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x09BF; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+09C0`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x09C0; Sign Ii             |\n|`U+09C1`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09C1; Sign U              |\n|`U+09C2`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09C2; Sign Uu             |\n|`U+09C3`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09C3; Sign Vocalic R      |\n|`U+09C4`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09C4; Sign Vocalic Rr     |\n|`U+09C5`   | _unassigned_     |                   |                            |                              |\n|`U+09C6`   | _unassigned_     |                   |                            |                              |\n|`U+09C7`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x09C7; Sign E              |\n|`U+09C8`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x09C8; Sign Ai             |\n|`U+09C9`   | _unassigned_     |                   |                            |                              |\n|`U+09CA`   | _unassigned_     |                   |                            |                              |\n|`U+09CB`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x09CB; Sign O              |\n|`U+09CC`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x09CC; Sign Au             |\n|`U+09CD`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x09CD; Virama              |\n|`U+09CE`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x09CE; Khanda Ta           |\n|`U+09CF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+09D0`   | _unassigned_     |                   |                            |                              |\n|`U+09D1`   | _unassigned_     |                   |                            |                              |\n|`U+09D2`   | _unassigned_     |                   |                            |                              |\n|`U+09D3`   | _unassigned_     |                   |                            |                              |\n|`U+09D4`   | _unassigned_     |                   |                            |                              |\n|`U+09D5`   | _unassigned_     |                   |                            |                              |\n|`U+09D6`   | _unassigned_     |                   |                            |                              |\n|`U+09D7`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x09D7; Au Length Mark      |\n|`U+09D8`   | _unassigned_     |                   |                            |                              |\n|`U+09D9`   | _unassigned_     |                   |                            |                              |\n|`U+09DA`   | _unassigned_     |                   |                            |                              |\n|`U+09DB`   | _unassigned_     |                   |                            |                              |\n|`U+09DC`   | Letter           | CONSONANT         | _null_                     | &#x09DC; Rra                 |\n|`U+09DD`   | Letter           | CONSONANT         | _null_                     | &#x09DD; Rha                 |\n|`U+09DE`   | _unassigned_     |                   |                            |                              |\n|`U+09DF`   | Letter           | CONSONANT         | _null_                     | &#x09DF; Yya                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+09E0`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x09E0; Vocalic Rr          |\n|`U+09E1`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x09E1; Vocalic Ll          |\n|`U+09E2`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09E2; Sign Vocalic L      |\n|`U+09E3`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x09E3; Sign Vocalic Ll     |\n|`U+09E4`   | _unassigned_     |                   |                            |                              |\n|`U+09E5`   | _unassigned_     |                   |                            |                              |\n|`U+09E6`   | Number           | NUMBER            | _null_                     | &#x09E6; Digit Zero          |\n|`U+09E7`   | Number           | NUMBER            | _null_                     | &#x09E7; Digit One           |\n|`U+09E8`   | Number           | NUMBER            | _null_                     | &#x09E8; Digit Two           |\n|`U+09E9`   | Number           | NUMBER            | _null_                     | &#x09E9; Digit Three         |\n|`U+09EA`   | Number           | NUMBER            | _null_                     | &#x09EA; Digit Four          |\n|`U+09EB`   | Number           | NUMBER            | _null_                     | &#x09EB; Digit Five          |\n|`U+09EC`   | Number           | NUMBER            | _null_                     | &#x09EC; Digit Six           |\n|`U+09ED`   | Number           | NUMBER            | _null_                     | &#x09ED; Digit Seven         |\n|`U+09EE`   | Number           | NUMBER            | _null_                     | &#x09EE; Digit Eight         |\n|`U+09EF`   | Number           | NUMBER            | _null_                     | &#x09EF; Digit Nine          |\n| | | | |\n|`U+09F0`   | Letter           | CONSONANT         | _null_                     | &#x09F0; Assamese Ra         |\n|`U+09F1`   | Letter           | CONSONANT         | _null_                     | &#x09F1; Assamese Wa         |\n|`U+09F2`   | Symbol           | SYMBOL            | _null_                     | &#x09F2; Rupee Mark          |\n|`U+09F3`   | Symbol           | SYMBOL            | _null_                     | &#x09F3; Rupee Sign          |\n|`U+09F4`   | Number           | NUMBER            | _null_                     | &#x09F4; Numerator One       |\n|`U+09F5`   | Number           | NUMBER            | _null_                     | &#x09F5; Numerator Two       |\n|`U+09F6`   | Number           | NUMBER            | _null_                     | &#x09F6; Numerator Three     |\n|`U+09F7`   | Number           | NUMBER            | _null_                     | &#x09F7; Numerator Four      |\n|`U+09F8`   | Number           | NUMBER            | _null_                     | &#x09F8; Numerator One Less Than Denominator |\n|`U+09F9`   | Number           | NUMBER            | _null_                     | &#x09F9; Denominator Sixteen |\n|`U+09FA`   | Symbol           | SYMBOL            | _null_                     | &#x09FA; Isshar              |\n|`U+09FB`   | Symbol           | SYMBOL            | _null_                     | &#x09FB; Ganda Mark          |\n|`U+09FC`   | Letter           | _null_            | _null_                     | &#x09FC; Vedic Anusvara      |\n|`U+09FD`   | Punctuation      | _null_            | _null_                     | &#x09FD; Abbreviation Sign   |\n|`U+09FE`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x09FE; Sandhi Mark         |\n|`U+09FF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Bengali script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Bengali text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Bengali text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Bengali text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-devanagari.md",
    "content": "# Devanagari character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Devanagari text](../opentype-shaping-devanagari.md).\n\n**Contents**\n\n  - [Devanagari character table](#devanagari-character-table)\n  - [Devanagari Extended character table](#devanagari-extended-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Devanagari character table ##\n\nDevanagari glyphs should be classified as in the following\ntable. Codepoints in the Devanagari block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Devanagari character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0900`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0900; Inverted Candrabindu|\n|`U+0901`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0901; Candrabindu         |\n|`U+0902`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0902; Anusvara            |\n|`U+0903`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0903; Visarga             |\n|`U+0904`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0904; Short A             |\n|`U+0905`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0905; A                   |\n|`U+0906`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0906; Aa                  |\n|`U+0907`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0907; I                   |\n|`U+0908`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0908; Ii                  |\n|`U+0909`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0909; U                   |\n|`U+090A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090A; Uu                  |\n|`U+090B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090B; Vocalic R           |\n|`U+090C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090C; Vocalic L           |\n|`U+090D`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090D; Candra E            |\n|`U+090E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090E; Short E             |\n|`U+090F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x090F; E                   |\n| | | | |\n|`U+0910`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0910; Ai                  |\n|`U+0911`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0911; Candra O            |\n|`U+0912`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0912; Short O             |\n|`U+0913`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0913; O                   |\n|`U+0914`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0914; Au                  |\n|`U+0915`   | Letter           | CONSONANT         | _null_                     | &#x0915; Ka                  |\n|`U+0916`   | Letter           | CONSONANT         | _null_                     | &#x0916; Kha                 |\n|`U+0917`   | Letter           | CONSONANT         | _null_                     | &#x0917; Ga                  |\n|`U+0918`   | Letter           | CONSONANT         | _null_                     | &#x0918; Gha                 |\n|`U+0919`   | Letter           | CONSONANT         | _null_                     | &#x0919; Nga                 |\n|`U+091A`   | Letter           | CONSONANT         | _null_                     | &#x091A; Ca                  |\n|`U+091B`   | Letter           | CONSONANT         | _null_                     | &#x091B; Cha                 |\n|`U+091C`   | Letter           | CONSONANT         | _null_                     | &#x091C; Ja                  |\n|`U+091D`   | Letter           | CONSONANT         | _null_                     | &#x091D; Jha                 |\n|`U+091E`   | Letter           | CONSONANT         | _null_                     | &#x091E; Nya                 |\n|`U+091F`   | Letter           | CONSONANT         | _null_                     | &#x091F; Tta                 |\n| | | | |\n|`U+0920`   | Letter           | CONSONANT         | _null_                     | &#x0920; Ttha                |\n|`U+0921`   | Letter           | CONSONANT         | _null_                     | &#x0921; Dda                 |\n|`U+0922`   | Letter           | CONSONANT         | _null_                     | &#x0922; Ddha                |\n|`U+0923`   | Letter           | CONSONANT         | _null_                     | &#x0923; Nna                 |\n|`U+0924`   | Letter           | CONSONANT         | _null_                     | &#x0924; Ta                  |\n|`U+0925`   | Letter           | CONSONANT         | _null_                     | &#x0925; Tha                 |\n|`U+0926`   | Letter           | CONSONANT         | _null_                     | &#x0926; Da                  |\n|`U+0927`   | Letter           | CONSONANT         | _null_                     | &#x0927; Dha                 |\n|`U+0928`   | Letter           | CONSONANT         | _null_                     | &#x0928; Na                  |\n|`U+0929`   | Letter           | CONSONANT         | _null_                     | &#x0929; Nnna                |\n|`U+092A`   | Letter           | CONSONANT         | _null_                     | &#x092A; Pa                  |\n|`U+092B`   | Letter           | CONSONANT         | _null_                     | &#x092B; Pha                 |\n|`U+092C`   | Letter           | CONSONANT         | _null_                     | &#x092C; Ba                  |\n|`U+092D`   | Letter           | CONSONANT         | _null_                     | &#x092D; Bha                 |\n|`U+092E`   | Letter           | CONSONANT         | _null_                     | &#x092E; Ma                  |\n|`U+092F`   | Letter           | CONSONANT         | _null_                     | &#x092F; Ya                  |\n| | | | |\n|`U+0930`   | Letter           | CONSONANT         | _null_                     | &#x0930; Ra                  |\n|`U+0931`   | Letter           | CONSONANT         | _null_                     | &#x0931; Rra                 |\n|`U+0932`   | Letter           | CONSONANT         | _null_                     | &#x0932; La                  |\n|`U+0933`   | Letter           | CONSONANT         | _null_                     | &#x0933; Lla                 |\n|`U+0934`   | Letter           | CONSONANT         | _null_                     | &#x0934; Llla                |\n|`U+0935`   | Letter           | CONSONANT         | _null_                     | &#x0935; Va                  |\n|`U+0936`   | Letter           | CONSONANT         | _null_                     | &#x0936; Sha                 |\n|`U+0937`   | Letter           | CONSONANT         | _null_                     | &#x0937; Ssa                 |\n|`U+0938`   | Letter           | CONSONANT         | _null_                     | &#x0938; Sa                  |\n|`U+0939`   | Letter           | CONSONANT         | _null_                     | &#x0939; Ha                  |\n|`U+093A`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x093A; Sign Oe             |\n|`U+093B`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x093B; Sign Ooe            |\n|`U+093C`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x093C; Nukta               |\n|`U+093D`   | Letter           | AVAGRAHA          | _null_                     | &#x093D; Avagraha            |\n|`U+093E`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x093E; Sign Aa             |\n|`U+093F`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x093F; Sign I              |\n| | | | |\n|`U+0940`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0940; Sign Ii             |\n|`U+0941`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0941; Sign U              |\n|`U+0942`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0942; Sign Uu             |\n|`U+0943`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0943; Sign Vocalic R      |\n|`U+0944`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0944; Sign Vocalic Rr     |\n|`U+0945`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0945; Sign Candra E       |\n|`U+0946`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0946; Sign Short E        |\n|`U+0947`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0947; Sign E              |\n|`U+0948`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0948; Sign Ai             |\n|`U+0949`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0949; Sign Candra O       |\n|`U+094A`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x094A; Sign Short O        |\n|`U+094B`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x094B; Sign O              |\n|`U+094C`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x094C; Sign Au             |\n|`U+094D`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x094D; Virama              |\n|`U+094E`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x094E; Sign Prishthamatra E|\n|`U+094F`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x094F; Sign Aw             |\n| | | | |\n|`U+0950`   | Mark [Mc]        | _null_            | _null_                     | &#x0950; Om                  |\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0953`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x0953; Grave accent        |\n|`U+0954`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x0954; Acute accent        |\n|`U+0955`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0955; Sign Candra Long E  |\n|`U+0956`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0956; Sign Ue             |\n|`U+0957`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0957; Sign Uue            |\n|`U+0958`   | Letter           | CONSONANT         | _null_                     | &#x0958; Qa                  |\n|`U+0959`   | Letter           | CONSONANT         | _null_                     | &#x0959; Khha                |\n|`U+095A`   | Letter           | CONSONANT         | _null_                     | &#x095A; Ghha                |\n|`U+095B`   | Letter           | CONSONANT         | _null_                     | &#x095B; Za                  |\n|`U+095C`   | Letter           | CONSONANT         | _null_                     | &#x095C; Dddha               |\n|`U+095D`   | Letter           | CONSONANT         | _null_                     | &#x095D; Rha                 |\n|`U+095E`   | Letter           | CONSONANT         | _null_                     | &#x095E; Fa                  |\n|`U+095F`   | Letter           | CONSONANT         | _null_                     | &#x095F; Yya                 |\n| | | | |\n|`U+0960`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0960; Vocalic Rr          |\n|`U+0961`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0961; Vocalic Ll          |\n|`U+0962`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0962; Sign Vocalic L      |\n|`U+0963`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0963; Sign Vocalic Ll     |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n|`U+0966`   | Number           | NUMBER            | _null_                     | &#x0966; Digit Zero          |\n|`U+0967`   | Number           | NUMBER            | _null_                     | &#x0967; Digit One           |\n|`U+0968`   | Number           | NUMBER            | _null_                     | &#x0968; Digit Two           |\n|`U+0969`   | Number           | NUMBER            | _null_                     | &#x0969; Digit Three         |\n|`U+096A`   | Number           | NUMBER            | _null_                     | &#x096A; Digit Four          |\n|`U+096B`   | Number           | NUMBER            | _null_                     | &#x096B; Digit Five          |\n|`U+096C`   | Number           | NUMBER            | _null_                     | &#x096C; Digit Six           |\n|`U+096D`   | Number           | NUMBER            | _null_                     | &#x096D; Digit Seven         |\n|`U+096E`   | Number           | NUMBER            | _null_                     | &#x096E; Digit Eight         |\n|`U+096F`   | Number           | NUMBER            | _null_                     | &#x096F; Digit Nine          |\n| | | | |\n|`U+0970`   | Punctuation      | _null_            | _null_                     | &#x0970; Abbreviation Sign   |\n|`U+0971`   | Punctuation      | _null_            | _null_                     | &#x0971; Sign High Spacing Dot|\n|`U+0972`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0972; Candra Aa           |\n|`U+0973`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0973; Oe                  |\n|`U+0974`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0974; Ooe                 |\n|`U+0975`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0975; Aw                  |\n|`U+0976`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0976; Ue                  |\n|`U+0977`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0977; Uue                 |\n|`U+0978`   | Letter           | CONSONANT         | _null_                     | &#x0978; Marwari Dda         |\n|`U+0979`   | Letter           | CONSONANT         | _null_                     | &#x0979; Zha                 |\n|`U+097A`   | Letter           | CONSONANT         | _null_                     | &#x097A; Heavy Ya            |\n|`U+097B`   | Letter           | CONSONANT         | _null_                     | &#x097B; Gga                 |\n|`U+097C`   | Letter           | CONSONANT         | _null_                     | &#x097C; Jja                 |\n|`U+097D`   | Letter           | CONSONANT         | _null_                     | &#x097D; Glottal Stop        |\n|`U+097E`   | Letter           | CONSONANT         | _null_                     | &#x097E; Ddda                |\n|`U+097F`   | Letter           | CONSONANT         | _null_                     | &#x097F; Bba                 |\n:::\n\n\n## Devanagari Extended character table ##\n\n> Note: the cantillation marks of the \"combining consonant\" variety in\n> the Devanagari Extended block are _not_ considered consonants for\n> shaping purposes (including syllable identification, the\n> determination of the base consonant, or positioning \"Reph\").\n\n\n:::{table} Devanagari Extended character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+A8E0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E0; Combining Zero      |\n|`U+A8E1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E1; Combining One       |\n|`U+A8E2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E2; Combining Two       |\n|`U+A8E3`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E3; Combining Three     |\n|`U+A8E4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E4; Combining Four      |\n|`U+A8E5`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E5; Combining Five      |\n|`U+A8E6`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E6; Combining Six       |\n|`U+A8E7`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E7; Combining Seven     |\n|`U+A8E8`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E8; Combining Eight     |\n|`U+A8E9`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8E9; Combining Nine      |\n|`U+A8EA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8EA; Combining A         |\n|`U+A8EB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8EB; Combining U         |\n|`U+A8EC`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8EC; Combining Ka        |\n|`U+A8ED`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8ED; Combining Na        |\n|`U+A8EE`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8EE; Combining Pa        |\n|`U+A8EF`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8EF; Combining Ra        |\n| | | | |\n|`U+A8F0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8F0; Combining Vi        |\n|`U+A8F1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#xA8F1; Combining Avagraha  |\n|`U+A8F2`   | Letter           | SYMBOL            | _null_                     | &#xA8F2; Spacing Candrabindu |\n|`U+A8F3`   | Letter           | BINDU             | _null_                     | &#xA8F3; Candrabindu Virama  |\n|`U+A8F4`   | Letter           | _null_            | _null_                     | &#xA8F4; Double Candrabindu Virama|\n|`U+A8F5`   | Letter           | _null_            | _null_                     | &#xA8F5; Candrabindu Two     |\n|`U+A8F6`   | Letter           | _null_            | _null_                     | &#xA8F6; Candrabindu Three   |\n|`U+A8F7`   | Letter           | SYMBOL            | _null_                     | &#xA8F7; Candrabindu Avagraha|\n|`U+A8F8`   | Punctuation      | _null_            | _null_                     | &#xA8F8; Pushpika            |\n|`U+A8F9`   | Punctuation      | _null_            | _null_                     | &#xA8F9; Gap Filler          |\n|`U+A8FA`   | Punctuation      | _null_            | _null_                     | &#xA8FA; Caret               |\n|`U+A8FB`   | Letter           | _null_            | _null_                     | &#xA8FB; Headstroke          |\n|`U+A8FC`   | Punctuation      | _null_            | _null_                     | &#xA8FC; Siddham             |\n|`U+A8FD`   | Letter           | _null_            | _null_                     | &#xA8FD; Jain Om             |\n|`U+A8FE`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#xA8FE; Ay                  |\n|`U+A8FF`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#xA8FF; Sign Ay             |\n| | | | |\n:::\n\n\n## Devanagari Extended-A character table ##\n\n\n:::{table} Devanagari Extended-A character table\n\n| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph                                   |\n|:----------|:-----------------|:--------------|:------------------------|:----------------------------------------|\n| `U+11B00` | Punctuation      | _null_        | _null_                  | &#x11B00; Head Mark                     |\n| `U+11B01` | Punctuation      | _null_        | _null_                  | &#x11B01; Head Mark With Headstroke     |\n| `U+11B02` | Punctuation      | _null_        | _null_                  | &#x11B02; Sign Bhale                    |\n| `U+11B03` | Punctuation      | _null_        | _null_                  | &#x11B03; Sign Bhale With Hook          |\n| `U+11B04` | Punctuation      | _null_        | _null_                  | &#x11B04; Sign Extended Bhale           |\n| `U+11B05` | Punctuation      | _null_        | _null_                  | &#x11B05; Sign Extended Bhale With Hook |\n| `U+11B06` | Punctuation      | _null_        | _null_                  | &#x11B06; Sign Western Five-like Bhale  |\n| `U+11B07` | Punctuation      | _null_        | _null_                  | &#x11B07; Sign Western Nine-like Bhale  |\n| `U+11B08` | Punctuation      | _null_        | _null_                  | &#x11B08; Sign Reversed Nine-like Bhale |\n| `U+11B09` | Punctuation      | _null_        | _null_                  | &#x11B09; Sign Mindu                    |\n| `U+11B0A` | _unassigned_     |               |                         |                                         |\n| `U+11B0B` | _unassigned_     |               |                         |                                         |\n| `U+11B0C` | _unassigned_     |               |                         |                                         |\n| `U+11B0D` | _unassigned_     |               |                         |                                         |\n| `U+11B0E` | _unassigned_     |               |                         |                                         |\n| `U+11B0F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n| `U+11B10` | _unassigned_     |               |                         |                                         |\n| `U+11B11` | _unassigned_     |               |                         |                                         |\n| `U+11B12` | _unassigned_     |               |                         |                                         |\n| `U+11B13` | _unassigned_     |               |                         |                                         |\n| `U+11B14` | _unassigned_     |               |                         |                                         |\n| `U+11B15` | _unassigned_     |               |                         |                                         |\n| `U+11B16` | _unassigned_     |               |                         |                                         |\n| `U+11B17` | _unassigned_     |               |                         |                                         |\n| `U+11B18` | _unassigned_     |               |                         |                                         |\n| `U+11B19` | _unassigned_     |               |                         |                                         |\n| `U+11B1A` | _unassigned_     |               |                         |                                         |\n| `U+11B1B` | _unassigned_     |               |                         |                                         |\n| `U+11B1C` | _unassigned_     |               |                         |                                         |\n| `U+11B1D` | _unassigned_     |               |                         |                                         |\n| `U+11B1E` | _unassigned_     |               |                         |                                         |\n| `U+11B1F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n| `U+11B20` | _unassigned_     |               |                         |                                         |\n| `U+11B21` | _unassigned_     |               |                         |                                         |\n| `U+11B22` | _unassigned_     |               |                         |                                         |\n| `U+11B23` | _unassigned_     |               |                         |                                         |\n| `U+11B24` | _unassigned_     |               |                         |                                         |\n| `U+11B25` | _unassigned_     |               |                         |                                         |\n| `U+11B26` | _unassigned_     |               |                         |                                         |\n| `U+11B27` | _unassigned_     |               |                         |                                         |\n| `U+11B28` | _unassigned_     |               |                         |                                         |\n| `U+11B29` | _unassigned_     |               |                         |                                         |\n| `U+11B2A` | _unassigned_     |               |                         |                                         |\n| `U+11B2B` | _unassigned_     |               |                         |                                         |\n| `U+11B2C` | _unassigned_     |               |                         |                                         |\n| `U+11B2D` | _unassigned_     |               |                         |                                         |\n| `U+11B2E` | _unassigned_     |               |                         |                                         |\n| `U+11B2F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n| `U+11B30` | _unassigned_     |               |                         |                                         |\n| `U+11B31` | _unassigned_     |               |                         |                                         |\n| `U+11B32` | _unassigned_     |               |                         |                                         |\n| `U+11B33` | _unassigned_     |               |                         |                                         |\n| `U+11B34` | _unassigned_     |               |                         |                                         |\n| `U+11B35` | _unassigned_     |               |                         |                                         |\n| `U+11B36` | _unassigned_     |               |                         |                                         |\n| `U+11B37` | _unassigned_     |               |                         |                                         |\n| `U+11B38` | _unassigned_     |               |                         |                                         |\n| `U+11B39` | _unassigned_     |               |                         |                                         |\n| `U+11B3A` | _unassigned_     |               |                         |                                         |\n| `U+11B3B` | _unassigned_     |               |                         |                                         |\n| `U+11B3C` | _unassigned_     |               |                         |                                         |\n| `U+11B3D` | _unassigned_     |               |                         |                                         |\n| `U+11B3E` | _unassigned_     |               |                         |                                         |\n| `U+11B3F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n| `U+11B40` | _unassigned_     |               |                         |                                         |\n| `U+11B41` | _unassigned_     |               |                         |                                         |\n| `U+11B42` | _unassigned_     |               |                         |                                         |\n| `U+11B43` | _unassigned_     |               |                         |                                         |\n| `U+11B44` | _unassigned_     |               |                         |                                         |\n| `U+11B45` | _unassigned_     |               |                         |                                         |\n| `U+11B46` | _unassigned_     |               |                         |                                         |\n| `U+11B47` | _unassigned_     |               |                         |                                         |\n| `U+11B48` | _unassigned_     |               |                         |                                         |\n| `U+11B49` | _unassigned_     |               |                         |                                         |\n| `U+11B4A` | _unassigned_     |               |                         |                                         |\n| `U+11B4B` | _unassigned_     |               |                         |                                         |\n| `U+11B4C` | _unassigned_     |               |                         |                                         |\n| `U+11B4D` | _unassigned_     |               |                         |                                         |\n| `U+11B4E` | _unassigned_     |               |                         |                                         |\n| `U+11B4F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n| `U+11B50` | _unassigned_     |               |                         |                                         |\n| `U+11B51` | _unassigned_     |               |                         |                                         |\n| `U+11B52` | _unassigned_     |               |                         |                                         |\n| `U+11B53` | _unassigned_     |               |                         |                                         |\n| `U+11B54` | _unassigned_     |               |                         |                                         |\n| `U+11B55` | _unassigned_     |               |                         |                                         |\n| `U+11B56` | _unassigned_     |               |                         |                                         |\n| `U+11B57` | _unassigned_     |               |                         |                                         |\n| `U+11B58` | _unassigned_     |               |                         |                                         |\n| `U+11B59` | _unassigned_     |               |                         |                                         |\n| `U+11B5A` | _unassigned_     |               |                         |                                         |\n| `U+11B5B` | _unassigned_     |               |                         |                                         |\n| `U+11B5C` | _unassigned_     |               |                         |                                         |\n| `U+11B5D` | _unassigned_     |               |                         |                                         |\n| `U+11B5E` | _unassigned_     |               |                         |                                         |\n| `U+11B5F` | _unassigned_     |               |                         |                                         |\n|           |                  |               |                         |                                         |\n:::\n\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Devanagari script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Devanagari text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-gujarati.md",
    "content": "# Gujarati character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Gujarati text](../opentype-shaping-gujarati.md).\n\n**Contents**\n\n  - [Gujarati character table](#gujarati-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Gujarati character table ##\n\nGujarati glyphs should be classified as in the following\ntable. Codepoints in the Gujarati block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n:::{table} Gujarati character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0A80`   | _unassigned_     |                   |                            |                              |\n|`U+0A81`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A81; Candrabindu         |\n|`U+0A82`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A82; Anusvara            |\n|`U+0A83`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0A83; Visarga             |\n|`U+0A84`   | _unassigned_     |                   |                            |                              |\n|`U+0A85`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A85; A                   |\n|`U+0A86`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A86; Aa                  |\n|`U+0A87`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A87; I                   |\n|`U+0A88`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A88; Ii                  |\n|`U+0A89`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A89; U                   |\n|`U+0A8A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A8A; Uu                  |\n|`U+0A8B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A8B; Vocalic R           |\n|`U+0A8C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A8C; Vocalic L           |\n|`U+0A8D`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A8D; Candra E            |\n|`U+0A8E`   | _unassigned_     |                   |                            |                              |\n|`U+0A8F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A8F; E                   |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0A90`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A90; Ai                  |\n|`U+0A91`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A91; Candra O            |\n|`U+0A92`   | _unassigned_     |                   |                            |                              |\n|`U+0A93`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A93; O                   |\n|`U+0A94`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A94; Au                  |\n|`U+0A95`   | Letter           | CONSONANT         | _null_                     | &#x0A95; Ka                  |\n|`U+0A96`   | Letter           | CONSONANT         | _null_                     | &#x0A96; Kha                 |\n|`U+0A97`   | Letter           | CONSONANT         | _null_                     | &#x0A97; Ga                  |\n|`U+0A98`   | Letter           | CONSONANT         | _null_                     | &#x0A98; Gha                 |\n|`U+0A99`   | Letter           | CONSONANT         | _null_                     | &#x0A99; Nga                 |\n|`U+0A9A`   | Letter           | CONSONANT         | _null_                     | &#x0A9A; Ca                  |\n|`U+0A9B`   | Letter           | CONSONANT         | _null_                     | &#x0A9B; Cha                 |\n|`U+0A9C`   | Letter           | CONSONANT         | _null_                     | &#x0A9C; Ja                  |\n|`U+0A9D`   | Letter           | CONSONANT         | _null_                     | &#x0A9D; Jha                 |\n|`U+0A9E`   | Letter           | CONSONANT         | _null_                     | &#x0A9E; Nya                 |\n|`U+0A9F`   | Letter           | CONSONANT         | _null_                     | &#x0A9F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0AA0`   | Letter           | CONSONANT         | _null_                     | &#x0AA0; Ttha                |\n|`U+0AA1`   | Letter           | CONSONANT         | _null_                     | &#x0AA1; Dda                 |\n|`U+0AA2`   | Letter           | CONSONANT         | _null_                     | &#x0AA2; Ddha                |\n|`U+0AA3`   | Letter           | CONSONANT         | _null_                     | &#x0AA3; Nna                 |\n|`U+0AA4`   | Letter           | CONSONANT         | _null_                     | &#x0AA4; Ta                  |\n|`U+0AA5`   | Letter           | CONSONANT         | _null_                     | &#x0AA5; Tha                 |\n|`U+0AA6`   | Letter           | CONSONANT         | _null_                     | &#x0AA6; Da                  |\n|`U+0AA7`   | Letter           | CONSONANT         | _null_                     | &#x0AA7; Dha                 |\n|`U+0AA8`   | Letter           | CONSONANT         | _null_                     | &#x0AA8; Na                  |\n|`U+0AA9`   | _unassigned_     |                   |                            |                              |\n|`U+0AAA`   | Letter           | CONSONANT         | _null_                     | &#x0AAA; Pa                  |\n|`U+0AAB`   | Letter           | CONSONANT         | _null_                     | &#x0AAB; Pha                 |\n|`U+0AAC`   | Letter           | CONSONANT         | _null_                     | &#x0AAC; Ba                  |\n|`U+0AAD`   | Letter           | CONSONANT         | _null_                     | &#x0AAD; Bha                 |\n|`U+0AAE`   | Letter           | CONSONANT         | _null_                     | &#x0AAE; Ma                  |\n|`U+0AAF`   | Letter           | CONSONANT         | _null_                     | &#x0AAF; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t    \n|`U+0AB0`   | Letter           | CONSONANT         | _null_                     | &#x0AB0; Ra                  |\n|`U+0AB1`   | _unassigned_     |                   |                            |                              |\n|`U+0AB2`   | Letter           | CONSONANT         | _null_                     | &#x0AB2; La                  |\n|`U+0AB3`   | Letter           | CONSONANT         | _null_                     | &#x0AB3; Lla                 |\n|`U+0AB4`   | _unassigned_     |                   |                            |                              |\n|`U+0AB5`   | Letter           | CONSONANT         | _null_                     | &#x0AB5; Va                  |\n|`U+0AB6`   | Letter           | CONSONANT         | _null_                     | &#x0AB6; Sha                 |\n|`U+0AB7`   | Letter           | CONSONANT         | _null_                     | &#x0AB7; Ssa                 |\n|`U+0AB8`   | Letter           | CONSONANT         | _null_                     | &#x0AB8; Sa                  |\n|`U+0AB9`   | Letter           | CONSONANT         | _null_                     | &#x0AB9; Ha                  |\n|`U+0ABA`   | _unassigned_     |                   |                            |                              |\n|`U+0ABB`   | _unassigned_     |                   |                            |                              |\n|`U+0ABC`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x0ABC; Nukta               |\n|`U+0ABD`   | Letter           | AVAGRAHA          | _null_                     | &#x0ABD; Avagraha            |\n|`U+0ABE`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0ABE; Sign Aa             |\n|`U+0ABF`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0ABF; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0AC0`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0AC0; Sign Ii             |\n|`U+0AC1`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AC1; Sign U              |\n|`U+0AC2`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AC2; Sign Uu             |\n|`U+0AC3`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AC3; Sign Vocalic R      |\n|`U+0AC4`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AC4; Sign Vocalic Rr     |\n|`U+0AC5`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0AC5; Sign Candra E       |\n|`U+0AC6`   | _unassigned_     |                   |                            |                              |\n|`U+0AC7`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0AC7; Sign E              |\n|`U+0AC8`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0AC8; Sign Ai             |\n|`U+0AC9`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0AC9; Sign Candra O       |\n|`U+0ACA`   | _unassigned_     |                   |                            |                              |\n|`U+0ACB`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0ACB; Sign O              |\n|`U+0ACC`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0ACC; Sign Au             |\n|`U+0ACD`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x0ACD; Virama              |\n|`U+0ACE`   | _unassigned_     |                   |                            |                              |\n|`U+0ACF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0AD0`   | Letter           | _null_            | _null_                     | &#x0AD0; Om                  |\n|`U+0AD1`   | _unassigned_     |                   |                            |                              |\n|`U+0AD2`   | _unassigned_     |                   |                            |                              |\n|`U+0AD3`   | _unassigned_     |                   |                            |                              |\n|`U+0AD4`   | _unassigned_     |                   |                            |                              |\n|`U+0AD5`   | _unassigned_     |                   |                            |                              |\n|`U+0AD6`   | _unassigned_     |                   |                            |                              |\n|`U+0AD7`   | _unassigned_     |                   |                            |                              |\n|`U+0AD8`   | _unassigned_     |                   |                            |                              |\n|`U+0AD9`   | _unassigned_     |                   |                            |                              |\n|`U+0ADA`   | _unassigned_     |                   |                            |                              |\n|`U+0ADB`   | _unassigned_     |                   |                            |                              |\n|`U+0ADC`   | _unassigned_     |                   |                            |                              |\n|`U+0ADD`   | _unassigned_     |                   |                            |                              |\n|`U+0ADE`   | _unassigned_     |                   |                            |                              |\n|`U+0ADF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+0AE0`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0AE0; Vocalic Rr          |\n|`U+0AE1`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0AE1; Vocalic Ll          |\n|`U+0AE2`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AE2; Sign Vocalic L      |\n|`U+0AE3`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0AE3; Sign Vocalic Ll     |\n|`U+0AE4`   | _unassigned_     |                   |                            |                              |\n|`U+0AE5`   | _unassigned_     |                   |                            |                              |\n|`U+0AE6`   | Number           | NUMBER            | _null_                     | &#x0AE6; Digit Zero          |\n|`U+0AE7`   | Number           | NUMBER            | _null_                     | &#x0AE7; Digit One           |\n|`U+0AE8`   | Number           | NUMBER            | _null_                     | &#x0AE8; Digit Two           |\n|`U+0AE9`   | Number           | NUMBER            | _null_                     | &#x0AE9; Digit Three         |\n|`U+0AEA`   | Number           | NUMBER            | _null_                     | &#x0AEA; Digit Four          |\n|`U+0AEB`   | Number           | NUMBER            | _null_                     | &#x0AEB; Digit Five          |\n|`U+0AEC`   | Number           | NUMBER            | _null_                     | &#x0AEC; Digit Six           |\n|`U+0AED`   | Number           | NUMBER            | _null_                     | &#x0AED; Digit Seven         |\n|`U+0AEE`   | Number           | NUMBER            | _null_                     | &#x0AEE; Digit Eight         |\n|`U+0AEF`   | Number           | NUMBER            | _null_                     | &#x0AEF; Digit Nine          |\n| | | | |\n|`U+0AF0`   | Symbol           | SYMBOL            | _null_                     | &#x0AF0; Abbreviation        |\n|`U+0AF1`   | Symbol           | SYMBOL            | _null_                     | &#x0AF1; Rupee Sign          |\n|`U+0AF2`   | _unassigned_     |                   |                            |                              |\n|`U+0AF3`   | _unassigned_     |                   |                            |                              |\n|`U+0AF4`   | _unassigned_     |                   |                            |                              |\n|`U+0AF5`   | _unassigned_     |                   |                            |                              |\n|`U+0AF6`   | _unassigned_     |                   |                            |                              |\n|`U+0AF7`   | _unassigned_     |                   |                            |                              |\n|`U+0AF8`   | _unassigned_     |                   |                            |                              |\n|`U+0AF9`   | Letter           | CONSONANT         | _null_                     | &#x0AF9; Zha                 |\n|`U+0AFA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0AFA; Sukun               |\n|`U+0AFB`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x0AFB; Shadda              |\n|`U+0AFC`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0AFC; Maddah              |\n|`U+0AFD`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x0AFD; Three-Dot Nukta Above|\n|`U+0AFE`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x0AFE; Circle Nukta Above  |\n|`U+0AFF`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x0AFF; Two-Circle Nukta Above|\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Gujarati script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Gujarati text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Gujarati text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Gujarati text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-gurmukhi.md",
    "content": "# Gurmukhi character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Gurmukhi text](../opentype-shaping-gurmukhi.md).\n\n**Contents**\n\n  - [Gurmukhi character table](#gurmukhi-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Gurmukhi character table ##\n\nGurmukhi glyphs should be classified as in the following\ntable. Codepoints in the Gurmukhi block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Gurmukhi character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0A00`   | _unassigned_     |                   |                            |                              |\n|`U+0A01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A01; Adak Bindi          |\n|`U+0A02`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A02; Bindi               |\n|`U+0A03`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0A03; Visarga             |\n|`U+0A04`   | _unassigned_     |                   |                            |                              |\n|`U+0A05`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A05; A                   |\n|`U+0A06`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A06; Aa                  |\n|`U+0A07`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A07; I                   |\n|`U+0A08`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A08; Ii                  |\n|`U+0A09`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A09; U                   |\n|`U+0A0A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A0A; Uu                  |\n|`U+0A0B`   | _unassigned_     |                   |                            |                              |\n|`U+0A0C`   | _unassigned_     |                   |                            |                              |\n|`U+0A0D`   | _unassigned_     |                   |                            |                              |\n|`U+0A0E`   | _unassigned_     |                   |                            |                              |\n|`U+0A0F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A0F; Ee                  |\n| | | | |\n|`U+0A10`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A10; Ai                  |\n|`U+0A11`   | _unassigned_     |                   |                            |                              |\n|`U+0A12`   | _unassigned_     |                   |                            |                              |\n|`U+0A13`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A13; Oo                  |\n|`U+0A14`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0A14; Au                  |\n|`U+0A15`   | Letter           | CONSONANT         | _null_                     | &#x0A15; Ka                  |\n|`U+0A16`   | Letter           | CONSONANT         | _null_                     | &#x0A16; Kha                 |\n|`U+0A17`   | Letter           | CONSONANT         | _null_                     | &#x0A17; Ga                  |\n|`U+0A18`   | Letter           | CONSONANT         | _null_                     | &#x0A18; Gha                 |\n|`U+0A19`   | Letter           | CONSONANT         | _null_                     | &#x0A19; Nga                 |\n|`U+0A1A`   | Letter           | CONSONANT         | _null_                     | &#x0A1A; Ca                  |\n|`U+0A1B`   | Letter           | CONSONANT         | _null_                     | &#x0A1B; Cha                 |\n|`U+0A1C`   | Letter           | CONSONANT         | _null_                     | &#x0A1C; Ja                  |\n|`U+0A1D`   | Letter           | CONSONANT         | _null_                     | &#x0A1D; Jha                 |\n|`U+0A1E`   | Letter           | CONSONANT         | _null_                     | &#x0A1E; Nya                 |\n|`U+0A1F`   | Letter           | CONSONANT         | _null_                     | &#x0A1F; Tta                 |\n| | | | |\n|`U+0A20`   | Letter           | CONSONANT         | _null_                     | &#x0A20; Ttha                |\n|`U+0A21`   | Letter           | CONSONANT         | _null_                     | &#x0A21; Dda                 |\n|`U+0A22`   | Letter           | CONSONANT         | _null_                     | &#x0A22; Ddha                |\n|`U+0A23`   | Letter           | CONSONANT         | _null_                     | &#x0A23; Nna                 |\n|`U+0A24`   | Letter           | CONSONANT         | _null_                     | &#x0A24; Ta                  |\n|`U+0A25`   | Letter           | CONSONANT         | _null_                     | &#x0A25; Tha                 |\n|`U+0A26`   | Letter           | CONSONANT         | _null_                     | &#x0A26; Da                  |\n|`U+0A27`   | Letter           | CONSONANT         | _null_                     | &#x0A27; Dha                 |\n|`U+0A28`   | Letter           | CONSONANT         | _null_                     | &#x0A28; Na                  |\n|`U+0A29`   | _unassigned_     |                   |                            |                              |\n|`U+0A2A`   | Letter           | CONSONANT         | _null_                     | &#x0A2A; Pa                  |\n|`U+0A2B`   | Letter           | CONSONANT         | _null_                     | &#x0A2B; Pha                 |\n|`U+0A2C`   | Letter           | CONSONANT         | _null_                     | &#x0A2C; Ba                  |\n|`U+0A2D`   | Letter           | CONSONANT         | _null_                     | &#x0A2D; Bha                 |\n|`U+0A2E`   | Letter           | CONSONANT         | _null_                     | &#x0A2E; Ma                  |\n|`U+0A2F`   | Letter           | CONSONANT         | _null_                     | &#x0A2F; Ya                  |\n| | | | |\n|`U+0A30`   | Letter           | CONSONANT         | _null_                     | &#x0A30; Ra                  |\n|`U+0A31`   | _unassigned_     |                   |                            |                              |\n|`U+0A32`   | Letter           | CONSONANT         | _null_                     | &#x0A32; La                  |\n|`U+0A33`   | Letter           | CONSONANT         | _null_                     | &#x0A33; Lla                 |\n|`U+0A34`   | _unassigned_     |                   |                            |                              |\n|`U+0A35`   | Letter           | CONSONANT         | _null_                     | &#x0A35; Va                  |\n|`U+0A36`   | Letter           | CONSONANT         | _null_                     | &#x0A36; Sha                 |\n|`U+0A37`   | _unassigned_     |                   |                            |                              |\n|`U+0A38`   | Letter           | CONSONANT         | _null_                     | &#x0A38; Sa                  |\n|`U+0A39`   | Letter           | CONSONANT         | _null_                     | &#x0A39; Ha                  |\n|`U+0A3A`   | _unassigned_     |                   |                            |                              |\n|`U+0A3B`   | _unassigned_     |                   |                            |                              |\n|`U+0A3C`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x0A3C; Nukta               |\n|`U+0A3D`   | _unassigned_     |                   |                            |                              |\n|`U+0A3E`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0A3E; Sign Aa             |\n|`U+0A3F`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0A3F; Sign I              |\n| | | | |\n|`U+0A40`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0A40; Sign Ii             |\n|`U+0A41`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0A41; Sign U              |\n|`U+0A42`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0A42; Sign Uu             |\n|`U+0A43`   | _unassigned_     |                   |                            |                              |\n|`U+0A44`   | _unassigned_     |                   |                            |                              |\n|`U+0A45`   | _unassigned_     |                   |                            |                              |\n|`U+0A46`   | _unassigned_     |                   |                            |                              |\n|`U+0A47`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0A47; Sign Ee             |\n|`U+0A48`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0A48; Sign Ai             |\n|`U+0A49`   | _unassigned_     |                   |                            |                              |\n|`U+0A4A`   | _unassigned_     |                   |                            |                              |\n|`U+0A4B`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0A4B; Sign Oo             |\n|`U+0A4C`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0A4C; Sign Au             |\n|`U+0A4D`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x0A4D; Virama              |\n|`U+0A4E`   | _unassigned_     |                   |                            |                              |\n|`U+0A4F`   | _unassigned_     |                   |                            |                              |\n| | | | |\n|`U+0A50`   | _unassigned_     |                   |                            |                              |\n|`U+0A51`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x0A51; Udaat               |\n|`U+0A52`   | _unassigned_     |                   |                            |                              |\n|`U+0A53`   | _unassigned_     |                   |                            |                              |\n|`U+0A54`   | _unassigned_     |                   |                            |                              |\n|`U+0A55`   | _unassigned_     |                   |                            |                              |\n|`U+0A56`   | _unassigned_     |                   |                            |                              |\n|`U+0A57`   | _unassigned_     |                   |                            |                              |\n|`U+0A58`   | _unassigned_     |                   |                            |                              |\n|`U+0A59`   | Letter           | CONSONANT         | _null_                     | &#x0A59; Khha                |\n|`U+0A5A`   | Letter           | CONSONANT         | _null_                     | &#x0A5A; Ghha                |\n|`U+0A5B`   | Letter           | CONSONANT         | _null_                     | &#x0A5B; Za                  |\n|`U+0A5C`   | Letter           | CONSONANT         | _null_                     | &#x0A5C; Rra                 |\n|`U+0A5D`   | _unassigned_     |                   |                            |                              |\n|`U+0A5E`   | Letter           | CONSONANT         | _null_                     | &#x0A5E; Fa                  |\n|`U+0A5F`   | _unassigned_     |                   |                            |                              |\n| | | | |\n|`U+0A60`   | _unassigned_     |                   |                            |                              |\n|`U+0A61`   | _unassigned_     |                   |                            |                              |\n|`U+0A62`   | _unassigned_     |                   |                            |                              |\n|`U+0A63`   | _unassigned_     |                   |                            |                              |\n|`U+0A64`   | _unassigned_     |                   |                            |                              |\n|`U+0A65`   | _unassigned_     |                   |                            |                              |\n|`U+0A66`   | Number           | NUMBER            | _null_                     | &#x0A66; Digit Zero          |\n|`U+0A67`   | Number           | NUMBER            | _null_                     | &#x0A67; Digit One           |\n|`U+0A68`   | Number           | NUMBER            | _null_                     | &#x0A68; Digit Two           |\n|`U+0A69`   | Number           | NUMBER            | _null_                     | &#x0A69; Digit Three         |\n|`U+0A6A`   | Number           | NUMBER            | _null_                     | &#x0A6A; Digit Four          |\n|`U+0A6B`   | Number           | NUMBER            | _null_                     | &#x0A6B; Digit Five          |\n|`U+0A6C`   | Number           | NUMBER            | _null_                     | &#x0A6C; Digit Six           |\n|`U+0A6D`   | Number           | NUMBER            | _null_                     | &#x0A6D; Digit Seven         |\n|`U+0A6E`   | Number           | NUMBER            | _null_                     | &#x0A6E; Digit Eight         |\n|`U+0A6F`   | Number           | NUMBER            | _null_                     | &#x0A6F; Digit Nine          |\n| | | | |\n|`U+0A70`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A70; Tippi               |\n|`U+0A71`   | Mark [Mn]        | GEMINATION_MARK   | TOP_POSITION               | &#x0A71; Addak               |\n|`U+0A72`   | Letter           | CONSONANT         | _null_                     | &#x0A72; Iri                 |\n|`U+0A73`   | Letter           | CONSONANT         | _null_                     | &#x0A73; Ura                 |\n|`U+0A74`   | Letter           | _null_            | _null_                     | &#x0A74; Ek Onkar            |\n|`U+0A75`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x0A75; Yakash              |\n|`U+0A76`   | Punctuation      | _null_            | _null_                     | &#x0A76; Abbreviation Sign   |\n|`U+0A77`   | _unassigned_     |                   |                            |                              |\n|`U+0A78`   | _unassigned_     |                   |                            |                              |\n|`U+0A79`   | _unassigned_     |                   |                            |                              |\n|`U+0A7A`   | _unassigned_     |                   |                            |                              |\n|`U+0A7B`   | _unassigned_     |                   |                            |                              |\n|`U+0A7C`   | _unassigned_     |                   |                            |                              |\n|`U+0A7D`   | _unassigned_     |                   |                            |                              |\n|`U+0A7E`   | _unassigned_     |                   |                            |                              |\n|`U+0A7F`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Gurmukhi script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Gurmukhi text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Gurmukhi text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Gurmukhi text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-hangul.md",
    "content": "# Hangul character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Hangul text](../opentype-shaping-hangul.md).\n\n**Contents**\n\n  - [Hangul Jamo character table](#hangul-jamo-character-table)\n  - [Hangul Jamo Extended-A character table](#hangul-jamo-extended-a-character-table)\n  - [Hangul Jamo Extended-B character table](#hangul-jamo-extended-b-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n  \n  - [Hangul Syllables: summary](#hangul-syllables-character-table)\n      \n\n## Hangul Jamo character table ##\n\nHangul Jamo should be classified as in the following\ntable. Codepoints in the Hangul Jamo block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nThe _Jamo type_ column indicates the syllable-component type of the\njamo. \"L\" for leading consonants (choseong), \"V\" for vowels\n(jungseong), and \"T\" for trailing consonants (jongseong).\n\nIn addition, the filler codepoints `U+115F` (Choseong Filler) and `U+1160`\n(Jungseong Filler) are classified as type \"Lf\" and \"Vf\", respectively.\n\nThe _Composing_ column indicates whether or not the jamo is capable of\nbeing canonically composed into a syllable included in the Hangul\nSyllables block. Jamo in the modern Korean alphabet are designated\n`YES`, while fillers and archaic jamo from the old Korean alphabet are\ndesignated `NO`.\n\n\n:::{table} Hangul Jamo character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+1100`   | Letter           | L         | YES       | &#x1100; Kiyeok                  |\n|`U+1101`   | Letter           | L         | YES       | &#x1101; Ssangkiyeok             |\n|`U+1102`   | Letter           | L         | YES       | &#x1102; Nieun                   |\n|`U+1103`   | Letter           | L         | YES       | &#x1103; Tikeut                  |\n|`U+1104`   | Letter           | L         | YES       | &#x1104; Ssangtikeut             |\n|`U+1105`   | Letter           | L         | YES       | &#x1105; Rieul                   |\n|`U+1106`   | Letter           | L         | YES       | &#x1106; Mieum                   |\n|`U+1107`   | Letter           | L         | YES       | &#x1107; Pieup                   |\n|`U+1108`   | Letter           | L         | YES       | &#x1108; Ssangpieup              |\n|`U+1109`   | Letter           | L         | YES       | &#x1109; Sios                    |\n|`U+110A`   | Letter           | L         | YES       | &#x110A; Ssangsios               |\n|`U+110B`   | Letter           | L         | YES       | &#x110B; Ieung                   |\n|`U+110C`   | Letter           | L         | YES       | &#x110C; Cieuc                   |\n|`U+110D`   | Letter           | L         | YES       | &#x110D; Ssangcieuc              |\n|`U+110E`   | Letter           | L         | YES       | &#x110E; Chieuch                 |\n|`U+110F`   | Letter           | L         | YES       | &#x110F; Khieukh                 |\n| | | | | | \n|`U+1110`   | Letter           | L         | YES       | &#x1110; Thieuth                 |\n|`U+1111`   | Letter           | L         | YES       | &#x1111; Phieuph                 |\n|`U+1112`   | Letter           | L         | YES       | &#x1112; Hieuh                   |\n|`U+1113`   | Letter           | L         | NO        | &#x1113; Nieun-Kiyeok            |\n|`U+1114`   | Letter           | L         | NO        | &#x1114; Ssangnieun              |\n|`U+1115`   | Letter           | L         | NO        | &#x1115; Nieun-Tikeut            |\n|`U+1116`   | Letter           | L         | NO        | &#x1116; Nieun-Pieup             |\n|`U+1117`   | Letter           | L         | NO        | &#x1117; Tikeut-Kiyeok           |\n|`U+1118`   | Letter           | L         | NO        | &#x1118; Rieul-Nieun             |\n|`U+1119`   | Letter           | L         | NO        | &#x1119; Ssangrieul              |\n|`U+111A`   | Letter           | L         | NO        | &#x111A; Rieul-Hieuh             |\n|`U+111B`   | Letter           | L         | NO        | &#x111B; Kapyeounrieul           |\n|`U+111C`   | Letter           | L         | NO        | &#x111C; Mieum-Pieup             |\n|`U+111D`   | Letter           | L         | NO        | &#x111D; Kapyeounmieum           |\n|`U+111E`   | Letter           | L         | NO        | &#x111E; Pieup-Kiyeok            |\n|`U+111F`   | Letter           | L         | NO        | &#x111F; Pieup-Nieun             |\n| | | | | |\n|`U+1120`   | Letter           | L         | NO        | &#x1120; Pieup-Tikeut            |\n|`U+1121`   | Letter           | L         | NO        | &#x1121; Pieup-Sios              |\n|`U+1122`   | Letter           | L         | NO        | &#x1122; Pieup-Sios-Kiyeok       |\n|`U+1123`   | Letter           | L         | NO        | &#x1123; Pieup-Sios-Tikeut       |\n|`U+1124`   | Letter           | L         | NO        | &#x1124; Pieup-Sios-Pieup        |\n|`U+1125`   | Letter           | L         | NO        | &#x1125; Pieup-Ssangsios         |\n|`U+1126`   | Letter           | L         | NO        | &#x1126; Pieup-Sios-Cieuc        |\n|`U+1127`   | Letter           | L         | NO        | &#x1127; Pieup-Cieuc             |\n|`U+1128`   | Letter           | L         | NO        | &#x1128; Pieup-Chieuch           |\n|`U+1129`   | Letter           | L         | NO        | &#x1129; Pieup-Thieuth           |\n|`U+112A`   | Letter           | L         | NO        | &#x112A; Pieup-Phieuph           |\n|`U+112B`   | Letter           | L         | NO        | &#x112B; Kapyeounpieup           |\n|`U+112C`   | Letter           | L         | NO        | &#x112C; Kapyeounssangpieup      |\n|`U+112D`   | Letter           | L         | NO        | &#x112D; Sios-Kiyeok             |\n|`U+112E`   | Letter           | L         | NO        | &#x112E; Sios-Nieun              |\n|`U+112F`   | Letter           | L         | NO        | &#x112F; Sios-Tikeut             |\n| | | | | |\n|`U+1130`   | Letter           | L         | NO        | &#x1130; Sios-Rieul              |\n|`U+1131`   | Letter           | L         | NO        | &#x1131; Sios-Mieum              |\n|`U+1132`   | Letter           | L         | NO        | &#x1132; Sios-Pieup              |\n|`U+1133`   | Letter           | L         | NO        | &#x1133; Sios-Pieup-Kiyeok       |\n|`U+1134`   | Letter           | L         | NO        | &#x1134; Sios-Ssangsios          |\n|`U+1135`   | Letter           | L         | NO        | &#x1135; Sios-Ieung              |\n|`U+1136`   | Letter           | L         | NO        | &#x1136; Sios-Cieuc              |\n|`U+1137`   | Letter           | L         | NO        | &#x1137; Sios-Chieuch            |\n|`U+1138`   | Letter           | L         | NO        | &#x1138; Sios-Khieukh            |\n|`U+1139`   | Letter           | L         | NO        | &#x1139; Sios-Thieuth            |\n|`U+113A`   | Letter           | L         | NO        | &#x113A; Sios-Phieuph            |\n|`U+113B`   | Letter           | L         | NO        | &#x113B; Sios-Hieuh              |\n|`U+113C`   | Letter           | L         | NO        | &#x113C; Chitueumsios            |\n|`U+113D`   | Letter           | L         | NO        | &#x113D; Chitueumssangsios       |\n|`U+113E`   | Letter           | L         | NO        | &#x113E; Ceongchieumsios         |\n|`U+113F`   | Letter           | L         | NO        | &#x113F; Ceongchieumssangsios    |\n| | | | | |\n|`U+1140`   | Letter           | L         | NO        | &#x1140; Pansios                 |\n|`U+1141`   | Letter           | L         | NO        | &#x1141; Ieung-Kiyeok            |\n|`U+1142`   | Letter           | L         | NO        | &#x1142; Ieung-Tikeut            |\n|`U+1143`   | Letter           | L         | NO        | &#x1143; Ieung-Mieum             |\n|`U+1144`   | Letter           | L         | NO        | &#x1144; Ieung-Pieup             |\n|`U+1145`   | Letter           | L         | NO        | &#x1145; Ieung-Sios              |\n|`U+1146`   | Letter           | L         | NO        | &#x1146; Ieung-Pansios           |\n|`U+1147`   | Letter           | L         | NO        | &#x1147; Ssangieung              |\n|`U+1148`   | Letter           | L         | NO        | &#x1148; Ieung-Cieuc             |\n|`U+1149`   | Letter           | L         | NO        | &#x1149; Ieung-Chieuch           |\n|`U+114A`   | Letter           | L         | NO        | &#x114A; Ieung-Thieuth           |\n|`U+114B`   | Letter           | L         | NO        | &#x114B; Ieung-Phieuph           |\n|`U+114C`   | Letter           | L         | NO        | &#x114C; Yesieung                |\n|`U+114D`   | Letter           | L         | NO        | &#x114D; Cieuc-Ieung             |\n|`U+114E`   | Letter           | L         | NO        | &#x114E; Chitueumcieuc           |\n|`U+114F`   | Letter           | L         | NO        | &#x114F; Chitueumssangcieuc      |\n| | | | | |\n|`U+1150`   | Letter           | L         | NO        | &#x1150; Ceongchieumcieuc        |\n|`U+1151`   | Letter           | L         | NO        | &#x1151; Ceongchieumssangcieuc   |\n|`U+1152`   | Letter           | L         | NO        | &#x1152; Chieuch-Khieukh         |\n|`U+1153`   | Letter           | L         | NO        | &#x1153; Chieuch-Hieuh           |\n|`U+1154`   | Letter           | L         | NO        | &#x1154; Chitueumchieuch         |\n|`U+1155`   | Letter           | L         | NO        | &#x1155; Ceongchieumchieuch      |\n|`U+1156`   | Letter           | L         | NO        | &#x1156; Phieuph-Pieup           |\n|`U+1157`   | Letter           | L         | NO        | &#x1157; Kapyeounphieuph         |\n|`U+1158`   | Letter           | L         | NO        | &#x1158; Ssanghieuh              |\n|`U+1159`   | Letter           | L         | NO        | &#x1159; Yeorinhieuh             |\n|`U+115A`   | Letter           | L         | NO        | &#x115A; Kiyeok-Tikeut           |\n|`U+115B`   | Letter           | L         | NO        | &#x115B; Nieun-Sios              |\n|`U+115C`   | Letter           | L         | NO        | &#x115C; Nieun-Cieuc             |\n|`U+115D`   | Letter           | L         | NO        | &#x115D; Nieun-Hieuh             |\n|`U+115E`   | Letter           | L         | NO        | &#x115E; Tikeut-Rieul            |\n|`U+115F`   | Letter           | Lf        | NO        | &#x115F; Choseong Filler         |\n| | | | | |\n|`U+1160`   | Letter           | Vf        | NO        | &#x1160; Jungseong Filler        |\n|`U+1161`   | Letter           | V         | YES       | &#x1161; A                       |\n|`U+1162`   | Letter           | V         | YES       | &#x1162; Ae                      |\n|`U+1163`   | Letter           | V         | YES       | &#x1163; Ya                      |\n|`U+1164`   | Letter           | V         | YES       | &#x1164; Yae                     |\n|`U+1165`   | Letter           | V         | YES       | &#x1165; Eo                      |\n|`U+1166`   | Letter           | V         | YES       | &#x1166; E                       |\n|`U+1167`   | Letter           | V         | YES       | &#x1167; Yeo                     |\n|`U+1168`   | Letter           | V         | YES       | &#x1168; Ye                      |\n|`U+1169`   | Letter           | V         | YES       | &#x1169; O                       |\n|`U+116A`   | Letter           | V         | YES       | &#x116A; Wa                      |\n|`U+116B`   | Letter           | V         | YES       | &#x116B; Wae                     |\n|`U+116C`   | Letter           | V         | YES       | &#x116C; Oe                      |\n|`U+116D`   | Letter           | V         | YES       | &#x116D; Yo                      |\n|`U+116E`   | Letter           | V         | YES       | &#x116E; U                       |\n|`U+116F`   | Letter           | V         | YES       | &#x116F; Weo                     |\n| | | | | |\n|`U+1170`   | Letter           | V         | YES       | &#x1170; We                      |\n|`U+1171`   | Letter           | V         | YES       | &#x1171; Wi                      |\n|`U+1172`   | Letter           | V         | YES       | &#x1172; Yu                      |\n|`U+1173`   | Letter           | V         | YES       | &#x1173; Eu                      |\n|`U+1174`   | Letter           | V         | YES       | &#x1174; Yi                      |\n|`U+1175`   | Letter           | V         | YES       | &#x1175; I                       |\n|`U+1176`   | Letter           | V         | NO        | &#x1176; A-O                     |\n|`U+1177`   | Letter           | V         | NO        | &#x1177; A-U                     |\n|`U+1178`   | Letter           | V         | NO        | &#x1178; Ya-O                    |\n|`U+1179`   | Letter           | V         | NO        | &#x1179; Ya-Yo                   |\n|`U+117A`   | Letter           | V         | NO        | &#x117A; Eo-O                    |\n|`U+117B`   | Letter           | V         | NO        | &#x117B; Eo-U                    |\n|`U+117C`   | Letter           | V         | NO        | &#x117C; Eo-Eu                   |\n|`U+117D`   | Letter           | V         | NO        | &#x117D; Yeo-O                   |\n|`U+117E`   | Letter           | V         | NO        | &#x117E; Yeo-U                   |\n|`U+117F`   | Letter           | V         | NO        | &#x117F; O-Eo                    |\n| | | | | |\n|`U+1180`   | Letter           | V         | NO        | &#x1180; O-E                     |\n|`U+1181`   | Letter           | V         | NO        | &#x1181; O-Ye                    |\n|`U+1182`   | Letter           | V         | NO        | &#x1182; O-O                     |\n|`U+1183`   | Letter           | V         | NO        | &#x1183; O-U                     |\n|`U+1184`   | Letter           | V         | NO        | &#x1184; Yo-Ya                   |\n|`U+1185`   | Letter           | V         | NO        | &#x1185; Yo-Yae                  |\n|`U+1186`   | Letter           | V         | NO        | &#x1186; Yo-Yeo                  |\n|`U+1187`   | Letter           | V         | NO        | &#x1187; Yo-O                    |\n|`U+1188`   | Letter           | V         | NO        | &#x1188; Yo-I                    |\n|`U+1189`   | Letter           | V         | NO        | &#x1189; U-A                     |\n|`U+118A`   | Letter           | V         | NO        | &#x118A; U-Ae                    |\n|`U+118B`   | Letter           | V         | NO        | &#x118B; U-Eo-Eu                 |\n|`U+118C`   | Letter           | V         | NO        | &#x118C; U-Ye                    |\n|`U+118D`   | Letter           | V         | NO        | &#x118D; U-U                     |\n|`U+118E`   | Letter           | V         | NO        | &#x118E; Yu-A                    |\n|`U+118F`   | Letter           | V         | NO        | &#x118F; Yu-Eo                   |\n| | | | | |\n|`U+1190`   | Letter           | V         | NO        | &#x1190; Yu-E                    |\n|`U+1191`   | Letter           | V         | NO        | &#x1191; Yu-Yeo                  |\n|`U+1192`   | Letter           | V         | NO        | &#x1192; Yu-Ye                   |\n|`U+1193`   | Letter           | V         | NO        | &#x1193; Yu-U                    |\n|`U+1194`   | Letter           | V         | NO        | &#x1194; Yu-I                    |\n|`U+1195`   | Letter           | V         | NO        | &#x1195; Eu-U                    |\n|`U+1196`   | Letter           | V         | NO        | &#x1196; Eu-Eu                   |\n|`U+1197`   | Letter           | V         | NO        | &#x1197; Yi-U                    |\n|`U+1198`   | Letter           | V         | NO        | &#x1198; I-A                     |\n|`U+1199`   | Letter           | V         | NO        | &#x1199; I-Ya                    |\n|`U+119A`   | Letter           | V         | NO        | &#x119A; I-O                     |\n|`U+119B`   | Letter           | V         | NO        | &#x119B; I-U                     |\n|`U+119C`   | Letter           | V         | NO        | &#x119C; I-Eu                    |\n|`U+119D`   | Letter           | V         | NO        | &#x119D; I-Araea                 |\n|`U+119E`   | Letter           | V         | NO        | &#x119E; Araea                   |\n|`U+119F`   | Letter           | V         | NO        | &#x119F; Araea-Eo                |\n| | | | | |\n|`U+11A0`   | Letter           | V         | NO        | &#x11A0; Araea-U                 |\n|`U+11A1`   | Letter           | V         | NO        | &#x11A1; Araea-I                 |\n|`U+11A2`   | Letter           | V         | NO        | &#x11A2; Ssangaraea              |\n|`U+11A3`   | Letter           | V         | NO        | &#x11A3; A-Eu                    |\n|`U+11A4`   | Letter           | V         | NO        | &#x11A4; Ya-U                    |\n|`U+11A5`   | Letter           | V         | NO        | &#x11A5; Yeo-Ya                  |\n|`U+11A6`   | Letter           | V         | NO        | &#x11A6; O-Ya                    |\n|`U+11A7`   | Letter           | V         | NO        | &#x11A7; O-Yae                   |\n|`U+11A8`   | Letter           | T         | YES       | &#x11A8; Kiyeok                  |\n|`U+11A9`   | Letter           | T         | YES       | &#x11A9; Ssangkiyeok             |\n|`U+11AA`   | Letter           | T         | YES       | &#x11AA; Kiyeok-Sios             |\n|`U+11AB`   | Letter           | T         | YES       | &#x11AB; Nieun                   |\n|`U+11AC`   | Letter           | T         | YES       | &#x11AC; Nieun-Cieuc             |\n|`U+11AD`   | Letter           | T         | YES       | &#x11AD; Nieun-Hieuh             |\n|`U+11AE`   | Letter           | T         | YES       | &#x11AE; Tikeut                  |\n|`U+11AF`   | Letter           | T         | YES       | &#x11AF; Rieul                   |\n| | | | | |\n|`U+11B0`   | Letter           | T         | YES       | &#x11B0; Rieul-Kiyeok            |\n|`U+11B1`   | Letter           | T         | YES       | &#x11B1; Rieul-Mieum             |\n|`U+11B2`   | Letter           | T         | YES       | &#x11B2; Rieul-Pieup             |\n|`U+11B3`   | Letter           | T         | YES       | &#x11B3; Rieul-Sios              |\n|`U+11B4`   | Letter           | T         | YES       | &#x11B4; Rieul-Thieuth           |\n|`U+11B5`   | Letter           | T         | YES       | &#x11B5; Rieul-Phieuph           |\n|`U+11B6`   | Letter           | T         | YES       | &#x11B6; Rieul-Hieuh             |\n|`U+11B7`   | Letter           | T         | YES       | &#x11B7; Mieum                   |\n|`U+11B8`   | Letter           | T         | YES       | &#x11B8; Pieup                   |\n|`U+11B9`   | Letter           | T         | YES       | &#x11B9; Pieup-Sios              |\n|`U+11BA`   | Letter           | T         | YES       | &#x11BA; Sios                    |\n|`U+11BB`   | Letter           | T         | YES       | &#x11BB; Ssangsios               |\n|`U+11BC`   | Letter           | T         | YES       | &#x11BC; Ieung                   |\n|`U+11BD`   | Letter           | T         | YES       | &#x11BD; Cieuc                   |\n|`U+11BE`   | Letter           | T         | YES       | &#x11BE; Chieuch                 |\n|`U+11BF`   | Letter           | T         | YES       | &#x11BF; Khieukh                 |\n| | | | | |\n|`U+11C0`   | Letter           | T         | YES       | &#x11C0; Thieuth                 |\n|`U+11C1`   | Letter           | T         | YES       | &#x11C1; Phieuph                 |\n|`U+11C2`   | Letter           | T         | YES       | &#x11C2; Hieuh                   |\n|`U+11C3`   | Letter           | T         | NO        | &#x11C3; Kiyeok-Rieul            |\n|`U+11C4`   | Letter           | T         | NO        | &#x11C4; Kiyeok-Sios-Kiyeok      |\n|`U+11C5`   | Letter           | T         | NO        | &#x11C5; Nieun-Kiyeok            |\n|`U+11C6`   | Letter           | T         | NO        | &#x11C6; Nieun-Tikeut            |\n|`U+11C7`   | Letter           | T         | NO        | &#x11C7; Nieun-Sios              |\n|`U+11C8`   | Letter           | T         | NO        | &#x11C8; Nieun-Pansios           |\n|`U+11C9`   | Letter           | T         | NO        | &#x11C9; Nieun-Thieuth           |\n|`U+11CA`   | Letter           | T         | NO        | &#x11CA; Tikeut-Kiyeok           |\n|`U+11CB`   | Letter           | T         | NO        | &#x11CB; Tikeut-Rieul            |\n|`U+11CC`   | Letter           | T         | NO        | &#x11CC; Rieul-Kiyeok-Sios       |\n|`U+11CD`   | Letter           | T         | NO        | &#x11CD; Rieul-Nieun             |\n|`U+11CE`   | Letter           | T         | NO        | &#x11CE; Rieul-Tikeut            |\n|`U+11CF`   | Letter           | T         | NO        | &#x11CF; Rieul-Tikeut-Hieuh      |\n| | | | | |\n|`U+11D0`   | Letter           | T         | NO        | &#x11D0; Ssangrieul              |\n|`U+11D1`   | Letter           | T         | NO        | &#x11D1; Rieul-Mieum-Kiyeok      |\n|`U+11D2`   | Letter           | T         | NO        | &#x11D2; Rieul-Mieum-Sios        |\n|`U+11D3`   | Letter           | T         | NO        | &#x11D3; Rieul-Pieup-Sios        |\n|`U+11D4`   | Letter           | T         | NO        | &#x11D4; Rieul-Pieup-Hieuh       |\n|`U+11D5`   | Letter           | T         | NO        | &#x11D5; Rieul-Kapyeounpieup     |\n|`U+11D6`   | Letter           | T         | NO        | &#x11D6; Rieul-Ssangsios         |\n|`U+11D7`   | Letter           | T         | NO        | &#x11D7; Rieul-Pansios           |\n|`U+11D8`   | Letter           | T         | NO        | &#x11D8; Rieul-Khieukh           |\n|`U+11D9`   | Letter           | T         | NO        | &#x11D9; Rieul-Yeorinhieuh       |\n|`U+11DA`   | Letter           | T         | NO        | &#x11DA; Mieum-Kiyeok            |\n|`U+11DB`   | Letter           | T         | NO        | &#x11DB; Mieum-Rieul             |\n|`U+11DC`   | Letter           | T         | NO        | &#x11DC; Mieum-Pieup             |\n|`U+11DD`   | Letter           | T         | NO        | &#x11DD; Mieum-Sios              |\n|`U+11DE`   | Letter           | T         | NO        | &#x11DE; Mieum-Ssangsios         |\n|`U+11DF`   | Letter           | T         | NO        | &#x11DF; Mieum-Pansios           |\n| | | | | |\n|`U+11E0`   | Letter           | T         | NO        | &#x11E0; Mieum-Chieuch           |\n|`U+11E1`   | Letter           | T         | NO        | &#x11E1; Mieum-Hieuh             |\n|`U+11E2`   | Letter           | T         | NO        | &#x11E2; Kapyeounmieum           |\n|`U+11E3`   | Letter           | T         | NO        | &#x11E3; Pieup-Rieul             |\n|`U+11E4`   | Letter           | T         | NO        | &#x11E4; Pieup-Phieuph           |\n|`U+11E5`   | Letter           | T         | NO        | &#x11E5; Pieup-Hieuh             |\n|`U+11E6`   | Letter           | T         | NO        | &#x11E6; Kapyeounpieup           |\n|`U+11E7`   | Letter           | T         | NO        | &#x11E7; Sios-Kiyeok             |\n|`U+11E8`   | Letter           | T         | NO        | &#x11E8; Sios-Tikeut             |\n|`U+11E9`   | Letter           | T         | NO        | &#x11E9; Sios-Rieul              |\n|`U+11EA`   | Letter           | T         | NO        | &#x11EA; Sios-Pieup              |\n|`U+11EB`   | Letter           | T         | NO        | &#x11EB; Pansios                 |\n|`U+11EC`   | Letter           | T         | NO        | &#x11EC; Ieung-Kiyeok            |\n|`U+11ED`   | Letter           | T         | NO        | &#x11ED; Ieung-Ssangkiyeok       |\n|`U+11EE`   | Letter           | T         | NO        | &#x11EE; Ssangieung              |\n|`U+11EF`   | Letter           | T         | NO        | &#x11EF; Ieung-Khieukh           |\n| | | | | |\n|`U+11F0`   | Letter           | T         | NO        | &#x11F0; Yesieung                |\n|`U+11F1`   | Letter           | T         | NO        | &#x11F1; Yesieung-Sios           |\n|`U+11F2`   | Letter           | T         | NO        | &#x11F2; Yesieung-Pansios        |\n|`U+11F3`   | Letter           | T         | NO        | &#x11F3; Phieuph-Pieup           |\n|`U+11F4`   | Letter           | T         | NO        | &#x11F4; Kapyeounphieuph         |\n|`U+11F5`   | Letter           | T         | NO        | &#x11F5; Hieuh-Nieun             |\n|`U+11F6`   | Letter           | T         | NO        | &#x11F6; Hieuh-Rieul             |\n|`U+11F7`   | Letter           | T         | NO        | &#x11F7; Hieuh-Mieum             |\n|`U+11F8`   | Letter           | T         | NO        | &#x11F8; Hieuh-Pieup             |\n|`U+11F9`   | Letter           | T         | NO        | &#x11F9; Yeorinhieuh             |\n|`U+11FA`   | Letter           | T         | NO        | &#x11FA; Kiyeok-Nieun            |\n|`U+11FB`   | Letter           | T         | NO        | &#x11FB; Kiyeok-Pieup            |\n|`U+11FC`   | Letter           | T         | NO        | &#x11FC; Kiyeok-Chieuch          |\n|`U+11FD`   | Letter           | T         | NO        | &#x11FD; Kiyeok-Khieukh          |\n|`U+11FE`   | Letter           | T         | NO        | &#x11FE; Kiyeok-Hieuh            |\n|`U+11FF`   | Letter           | T         | NO        | &#x11FF; Ssangnieun              |\n:::\n\n\n## Hangul Jamo Extended-A character table ##\n\nHangul Jamo should be classified as in the following\ntable. Codepoints in the Hangul Jamo Extended-A block with no assigned\nmeaning are designated as _unassigned_ in the _Unicode category_ column. \n\nThe _Jamo type_ column indicates the syllable-component type of the\njamo. All assigned codepoints in the Hangul Jamo Extended-A block are\nclassified as type \"L\" for leading consonants (choseong).\n\n\n:::{table} Hangul Jamo Extended-A character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+A960`   | Letter           | L         | NO        | &#xA960; Tikeut-Mieum            |\n|`U+A961`   | Letter           | L         | NO        | &#xA961; Tikeut-Pieup            |\n|`U+A962`   | Letter           | L         | NO        | &#xA962; Tikeut-Sios             |\n|`U+A963`   | Letter           | L         | NO        | &#xA963; Tikeut-Cieuc            |\n|`U+A964`   | Letter           | L         | NO        | &#xA964; Rieul-Kiyeok            |\n|`U+A965`   | Letter           | L         | NO        | &#xA965; Rieul-Ssangkiyeok       |\n|`U+A966`   | Letter           | L         | NO        | &#xA966; Rieul-Tikeut            |\n|`U+A967`   | Letter           | L         | NO        | &#xA967; Rieul-Ssangtikeut       |\n|`U+A968`   | Letter           | L         | NO        | &#xA968; Rieul-Mieum             |\n|`U+A969`   | Letter           | L         | NO        | &#xA969; Rieul-Pieup             |\n|`U+A96A`   | Letter           | L         | NO        | &#xA96A; Rieul-Ssangpieup        |\n|`U+A96B`   | Letter           | L         | NO        | &#xA96B; Rieul-Kapyeounpieup     |\n|`U+A96C`   | Letter           | L         | NO        | &#xA96C; Rieul-Sios              |\n|`U+A96D`   | Letter           | L         | NO        | &#xA96D; Rieul-Cieuc             |\n|`U+A96E`   | Letter           | L         | NO        | &#xA96E; Rieul-Khieukh           |\n|`U+A96F`   | Letter           | L         | NO        | &#xA96F; Mieum-Kiyeok            |\n| | | | | | \n|`U+A970`   | Letter           | L         | NO        | &#xA970; Mieum-Tikeut            |\n|`U+A971`   | Letter           | L         | NO        | &#xA971; Mieum-Sios              |\n|`U+A972`   | Letter           | L         | NO        | &#xA972; Pieup-Sios-Thieuth      |\n|`U+A973`   | Letter           | L         | NO        | &#xA973; Pieup-Khieukh           |\n|`U+A974`   | Letter           | L         | NO        | &#xA974; Pieup-Hieuh             |\n|`U+A975`   | Letter           | L         | NO        | &#xA975; Ssangsios-Pieup         |\n|`U+A976`   | Letter           | L         | NO        | &#xA976; Ieung-Rieul             |\n|`U+A977`   | Letter           | L         | NO        | &#xA977; Ieung-Hieuh             |\n|`U+A978`   | Letter           | L         | NO        | &#xA978; Ssangcieuc-Hieuh        |\n|`U+A979`   | Letter           | L         | NO        | &#xA979; Ssangthieuth            |\n|`U+A97A`   | Letter           | L         | NO        | &#xA97A; Phieuph-Hieuh           |\n|`U+A97B`   | Letter           | L         | NO        | &#xA97B; Hieuh-Sios              |\n|`U+A97C`   | Letter           | L         | NO        | &#xA97C; Ssangyeorinhieuh        |\n|`U+A97D`   | _unassigned_     |           |           |                                  |\n|`U+A97E`   | _unassigned_     |           |           |                                  |\n|`U+A97F`   | _unassigned_     |           |           |                                  |\n:::\n\n\n## Hangul Jamo Extended-B character table ##\n\nHangul Jamo should be classified as in the following\ntable. Codepoints in the Hangul Jamo Extended-B block with no assigned\nmeaning are designated as _unassigned_ in the _Unicode category_ column. \n\nThe _Jamo type_ column indicates the syllable-component type of the\njamo. \"V\" for vowels (jungseong) and \"T\" for trailing consonants (jongseong).\n\n\n:::{table} Hangul Jamo Extended-B character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+D7B0`   | Letter           | V         | NO        | &#xD7B0; O-Yeo                   |\n|`U+D7B1`   | Letter           | V         | NO        | &#xD7B1; O-O-I                   |\n|`U+D7B2`   | Letter           | V         | NO        | &#xD7B2; Yo-A                    |\n|`U+D7B3`   | Letter           | V         | NO        | &#xD7B3; Yo-Ae                   |\n|`U+D7B4`   | Letter           | V         | NO        | &#xD7B4; Yo-Eo                   |\n|`U+D7B5`   | Letter           | V         | NO        | &#xD7B5; U-Yeo                   |\n|`U+D7B6`   | Letter           | V         | NO        | &#xD7B6; U-I-I                   |\n|`U+D7B7`   | Letter           | V         | NO        | &#xD7B7; Yu-Ae                   |\n|`U+D7B8`   | Letter           | V         | NO        | &#xD7B8; Yu-O                    |\n|`U+D7B9`   | Letter           | V         | NO        | &#xD7B9; Eu-A                    |\n|`U+D7BA`   | Letter           | V         | NO        | &#xD7BA; Eu-Eo                   |\n|`U+D7BB`   | Letter           | V         | NO        | &#xD7BB; Eu-E                    |\n|`U+D7BC`   | Letter           | V         | NO        | &#xD7BC; Eu-O                    |\n|`U+D7BD`   | Letter           | V         | NO        | &#xD7BD; I-Ya-O                  |\n|`U+D7BE`   | Letter           | V         | NO        | &#xD7BE; I-Yae                   |\n|`U+D7BF`   | Letter           | V         | NO        | &#xD7BF; I-Yeo                   |\n| | | | | | \n|`U+D7C0`   | Letter           | V         | NO        | &#xD7C0; I-Ye                    |\n|`U+D7C1`   | Letter           | V         | NO        | &#xD7C1; I-O-I                   |\n|`U+D7C2`   | Letter           | V         | NO        | &#xD7C2; I-Yo                    |\n|`U+D7C3`   | Letter           | V         | NO        | &#xD7C3; I-Yu                    |\n|`U+D7C4`   | Letter           | V         | NO        | &#xD7C4; I-I                     |\n|`U+D7C5`   | Letter           | V         | NO        | &#xD7C5; Araea-A                 |\n|`U+D7C6`   | Letter           | V         | NO        | &#xD7C6; Araea-E                 |\n|`U+D7C7`   | _unassigned_     |           |           |                                  |\n|`U+D7C8`   | _unassigned_     |           |           |                                  |\n|`U+D7C9`   | _unassigned_     |           |           |                                  |\n|`U+D7CA`   | _unassigned_     |           |           |                                  |\n|`U+D7CB`   | Letter           | T         | NO        | &#xD7CB; Nieun-Rieul             |\n|`U+D7CC`   | Letter           | T         | NO        | &#xD7CC; Nieun-Chieuch           |\n|`U+D7CD`   | Letter           | T         | NO        | &#xD7CD; Ssangtikeut             |\n|`U+D7CE`   | Letter           | T         | NO        | &#xD7CE; Ssangtikeut-Pieup       |\n|`U+D7CF`   | Letter           | T         | NO        | &#xD7CF; Tikeut-Pieup            |\n| | | | | | \n|`U+D7D0`   | Letter           | T         | NO        | &#xD7D0; Tikeut-Sios             |\n|`U+D7D1`   | Letter           | T         | NO        | &#xD7D1; Tikeut-Sios-Kiyeok      |\n|`U+D7D2`   | Letter           | T         | NO        | &#xD7D2; Tikeut-Cieuc            |\n|`U+D7D3`   | Letter           | T         | NO        | &#xD7D3; Tikeut-Chieuch          |\n|`U+D7D4`   | Letter           | T         | NO        | &#xD7D4; Tikeut-Thieuth          |\n|`U+D7D5`   | Letter           | T         | NO        | &#xD7D5; Rieul-Ssangkiyeok       |\n|`U+D7D6`   | Letter           | T         | NO        | &#xD7D6; Rieul-Kiyeok-Hieuh      |\n|`U+D7D7`   | Letter           | T         | NO        | &#xD7D7; Ssangrieul-Khieukh      |\n|`U+D7D8`   | Letter           | T         | NO        | &#xD7D8; Rieul-Mieum-Hieuh       |\n|`U+D7D9`   | Letter           | T         | NO        | &#xD7D9; Rieul-Pieup-Tikeut      |\n|`U+D7DA`   | Letter           | T         | NO        | &#xD7DA; Rieul-Pieup-Phieuph     |\n|`U+D7DB`   | Letter           | T         | NO        | &#xD7DB; Rieul-Yesieung          |\n|`U+D7DC`   | Letter           | T         | NO        | &#xD7DC; Rieul-Yeorinhieuh-Hieuh |\n|`U+D7DD`   | Letter           | T         | NO        | &#xD7DD; Kapyeounrieul           |\n|`U+D7DE`   | Letter           | T         | NO        | &#xD7DE; Mieum-Nieun             |\n|`U+D7DF`   | Letter           | T         | NO        | &#xD7DF; Mieum-Ssangnieun        |\n| | | | | | \n|`U+D7E0`   | Letter           | T         | NO        | &#xD7E0; Ssangmieum              |\n|`U+D7E1`   | Letter           | T         | NO        | &#xD7E1; Mieum-Pieup-Sios        |\n|`U+D7E2`   | Letter           | T         | NO        | &#xD7E2; Mieum-Cieuc             |\n|`U+D7E3`   | Letter           | T         | NO        | &#xD7E3; Pieup-Tikeut            |\n|`U+D7E4`   | Letter           | T         | NO        | &#xD7E4; Pieup-Rieul-Phieuph     |\n|`U+D7E5`   | Letter           | T         | NO        | &#xD7E5; Pieup-Mieum             |\n|`U+D7E6`   | Letter           | T         | NO        | &#xD7E6; Ssangpieup              |\n|`U+D7E7`   | Letter           | T         | NO        | &#xD7E7; Pieup-Sios-Tikeut       |\n|`U+D7E8`   | Letter           | T         | NO        | &#xD7E8; Pieup-Cieuc             |\n|`U+D7E9`   | Letter           | T         | NO        | &#xD7E9; Pieup-Chieuch           |\n|`U+D7EA`   | Letter           | T         | NO        | &#xD7EA; Sios-Mieum              |\n|`U+D7EB`   | Letter           | T         | NO        | &#xD7EB; Sios-Kapyeounpieup      |\n|`U+D7EC`   | Letter           | T         | NO        | &#xD7EC; Ssangsios-Kiyeok        |\n|`U+D7ED`   | Letter           | T         | NO        | &#xD7ED; Ssangsios-Tikeut        |\n|`U+D7EE`   | Letter           | T         | NO        | &#xD7EE; Sios-Pansios            |\n|`U+D7EF`   | Letter           | T         | NO        | &#xD7EF; Sios-Cieuc              |\n| | | | | | \n|`U+D7F0`   | Letter           | T         | NO        | &#xD7F0; Sios-Chieuch            |\n|`U+D7F1`   | Letter           | T         | NO        | &#xD7F1; Sios-Thieuth            |\n|`U+D7F2`   | Letter           | T         | NO        | &#xD7F2; Sios-Hieuh              |\n|`U+D7F3`   | Letter           | T         | NO        | &#xD7F3; Pansios-Pieup           |\n|`U+D7F4`   | Letter           | T         | NO        | &#xD7F4; Pansios-Kapyeounpieup   |\n|`U+D7F5`   | Letter           | T         | NO        | &#xD7F5; Yesieung-Mieum          |\n|`U+D7F6`   | Letter           | T         | NO        | &#xD7F6; Yesieung-Hieuh          |\n|`U+D7F7`   | Letter           | T         | NO        | &#xD7F7; Cieuc-Pieup             |\n|`U+D7F8`   | Letter           | T         | NO        | &#xD7F8; Cieuc-Ssangpieup        |\n|`U+D7F9`   | Letter           | T         | NO        | &#xD7F9; Ssangcieuc              |\n|`U+D7FA`   | Letter           | T         | NO        | &#xD7FA; Phieuph-Sios            |\n|`U+D7FB`   | Letter           | T         | NO        | &#xD7FB; Phieuph-Thieuth         |\n|`U+D7FC`   | _unassigned_     |           |           |                                  |\n|`U+D7FD`   | _unassigned_     |           |           |                                  |\n|`U+D7FE`   | _unassigned_     |           |           |                                  |\n|`U+D7FF`   | _unassigned_     |           |           |                                  |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Hangul text may use\npunctuation marks from the CJK Symbols And Punctuation block. \n\nOf particular note are the single-dot tone mark (single-dot bangjeom)\nand double-dot tone mark (double-dot bangjeom), `U+302E` and\n`U+302F`. These non-spacing marks are common in Old Korean.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+302E`   | Mark [Mn]        | _null_    | _null_    | &#x302E; Single Dot Tone Mark    |\n|`U+302F`   | Mark [Mn]        | _null_    | _null_    | &#x302F; Double Dot Tone Mark    |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Hangul text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).\n\nThe dotted-circle placeholder is frequently used when displaying a\nmark in isolation. Real-world text may also use other characters, such\nas hyphens or dashes, in a similar placeholder fashion; shaping\nengines should cope with this situation gracefully.\n\nThe zero-width space (`U+200B`) or word joiner (`U+2060`) may be used\nbetween two jamo to prevent them from being conjoined into a\nsyllable. The zero-width space allows a line break to happen between\nthe jamo, while the word joiner prevents the jamo from being separated\nby a line break.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+200B`   | Separator        | _null_    | _null_    | &#x200B; Zero-width space        |\n|`U+200C`   | Other            | _null_    | _null_    | &#x200C; Zero-width non-joiner   |\n|`U+200D`   | Other            | _null_    | _null_    | &#x200D; Zero-width joiner       |\n|`U+2060`   | Other            | _null_    | _null_    | &#x2060; Word joiner             |\n|`U+25CC`   | Symbol           | _null_    | _null_    | &#x25CC; Dotted circle           |\n:::\n\n\n## Hangul Syllables character table ##\n\nThe Hangul Syllables block is too large to include a full character\ntable in this document.\n\nEach syllable codepoint is classified either as type `LV` or type `LVT`,\nindicating whether or not the syllable includes a trailing consonant\n(jongseong) at the end.\n\nSyllable codepoints are sorted in Hangul alphabetic order, first by\nleading consonant (choseong), followed by vowel (jungseong), followed\nby trailing consonant (jongseong).\n\nThis enables the algorithmic composition and decomposition of combining\njamo sequences and syllable codepoints.\n\n\n:::{table} Hangul Syllables character table\n\n| Codepoint | Unicode category | Syllable type | Glyph                            |\n|:----------|:-----------------|:--------------|:---------------------------------|\n|`U+AC00`   | Letter [Lo]      | LV            | &#xac00; G-A                     |\n| | | | |\n|`U+D5CC`   | Letter [Lo]      | LVT           | &#xd5cc; H-A-N                   |\n:::\n"
  },
  {
    "path": "character-tables/character-tables-hebrew.md",
    "content": "\n# Hebrew character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Hebrew text](../opentype-shaping-hebrew.md).\n\n**Contents**\n\nSeparate character tables are provided for the Hebrew block, the\nHebrew letters included in the Alphabetic Presentation Forms block,\nand for other miscellaneous characters that are used in `<hebr>` text\nruns:\n\n  - [Hebrew character table](#hebrew-character-table)\n  - [Alphabetic Presentation Forms character table](#alphabetic-presentation-forms-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\nThe tables list each codepoint along with its Unicode general\ncategory. For marks, the table lists the codepoint's mark combining\nclass. The codepoint's Unicode name and an example glyph are also provided.\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n## Hebrew character table ##\n\n\n:::{table} Hebrew character table\n\n| Codepoint | Unicode category | Mark class | Glyph                                |\n|:----------|:-----------------|:-----------|:-------------------------------------|\n| `U+0590`  | _unassigned_     |            |                                      |\n| `U+0591`  | Mark [Mn]        | 220        | &#x0591; Accent Etnahta              |\n| `U+0592`  | Mark [Mn]        | 230        | &#x0592; Accent Segol                |\n| `U+0593`  | Mark [Mn]        | 230        | &#x0593; Accent Shalshelet           |\n| `U+0594`  | Mark [Mn]        | 230        | &#x0594; Accent Zaqef Qatan          |\n| `U+0595`  | Mark [Mn]        | 230        | &#x0595; Accent Zaqef Gadol          |\n| `U+0596`  | Mark [Mn]        | 220        | &#x0596; Accent Tipeha               |\n| `U+0597`  | Mark [Mn]        | 230        | &#x0597; Accent Revia                |\n| `U+0598`  | Mark [Mn]        | 230        | &#x0598; Accent Zarqa                |\n| `U+0599`  | Mark [Mn]        | 230        | &#x0599; Accent Pashta               |\n| `U+059A`  | Mark [Mn]        | 222        | &#x059A; Accent Yetiv                |\n| `U+059B`  | Mark [Mn]        | 220        | &#x059B; Accent Tevir                |\n| `U+059C`  | Mark [Mn]        | 230        | &#x059C; Accent Geresh               |\n| `U+059D`  | Mark [Mn]        | 230        | &#x059D; Accent Geresh Muqdam        |\n| `U+059E`  | Mark [Mn]        | 230        | &#x059E; Accent Gershayim            |\n| `U+059F`  | Mark [Mn]        | 230        | &#x059F; Accent Qarney Para          |\n| | | | |\n| `U+05A0`  | Mark [Mn]        | 230        | &#x05A0; Accent Telisha Gedola       |\n| `U+05A1`  | Mark [Mn]        | 230        | &#x05A1; Accent Pazer                |\n| `U+05A2`  | Mark [Mn]        | 220        | &#x05A2; Accent Atnah Hafukh         |\n| `U+05A3`  | Mark [Mn]        | 220        | &#x05A3; Accent Munah                |\n| `U+05A4`  | Mark [Mn]        | 220        | &#x05A4; Accent Mahapakh             |\n| `U+05A5`  | Mark [Mn]        | 220        | &#x05A5; Accent Merkha               |\n| `U+05A6`  | Mark [Mn]        | 220        | &#x05A6; Accent Merkha Kefula        |\n| `U+05A7`  | Mark [Mn]        | 220        | &#x05A7; Accent Darga                |\n| `U+05A8`  | Mark [Mn]        | 230        | &#x05A8; Accent Qadma                |\n| `U+05A9`  | Mark [Mn]        | 230        | &#x05A9; Accent Telisha Qetana       |\n| `U+05AA`  | Mark [Mn]        | 220        | &#x05AA; Accent Yerah Ben Yomo       |\n| `U+05AB`  | Mark [Mn]        | 230        | &#x05AB; Accent Ole                  |\n| `U+05AC`  | Mark [Mn]        | 230        | &#x05AC; Accent Iluy                 |\n| `U+05AD`  | Mark [Mn]        | 222        | &#x05AD; Accent Dehi                 |\n| `U+05AE`  | Mark [Mn]        | 228        | &#x05AE; Accent Zinor                |\n| `U+05AF`  | Mark [Mn]        | 230        | &#x05AF; Mark Masora Circle          |\n| | | | |\n| `U+05B0`  | Mark [Mn]        | 10         | &#x05B0; Point Sheva                 |\n| `U+05B1`  | Mark [Mn]        | 11         | &#x05B1; Point Hataf Segol           |\n| `U+05B2`  | Mark [Mn]        | 12         | &#x05B2; Point Hataf Patah           |\n| `U+05B3`  | Mark [Mn]        | 13         | &#x05B3; Point Hataf Qamats          |\n| `U+05B4`  | Mark [Mn]        | 14         | &#x05B4; Point Hiriq                 |\n| `U+05B5`  | Mark [Mn]        | 15         | &#x05B5; Point Tsere                 |\n| `U+05B6`  | Mark [Mn]        | 16         | &#x05B6; Point Segol                 |\n| `U+05B7`  | Mark [Mn]        | 17         | &#x05B7; Point Patah                 |\n| `U+05B8`  | Mark [Mn]        | 18         | &#x05B8; Point Qamats                |\n| `U+05B9`  | Mark [Mn]        | 19         | &#x05B9; Point Holam                 |\n| `U+05BA`  | Mark [Mn]        | 19         | &#x05BA; Point Holam Haser For Vav   |\n| `U+05BB`  | Mark [Mn]        | 20         | &#x05BB; Point Qubuts                |\n| `U+05BC`  | Mark [Mn]        | 21         | &#x05BC; Point Dagesh Or Mapiq       |\n| `U+05BD`  | Mark [Mn]        | 22         | &#x05BD; Point Meteg                 |\n| `U+05BE`  | Punctuation Dash | _0_        | &#x05BE; Punctuation Maqaf           |\n| `U+05BF`  | Mark [Mn]        | 23         | &#x05BF; Point Rafe                  |\n| | | | |\n| `U+05C0`  | Punctuation      | _0_        | &#x05C0; Punctuation Paseq           |\n| `U+05C1`  | Mark [Mn]        | 24         | &#x05C1; Point Shin Dot              |\n| `U+05C2`  | Mark [Mn]        | 25         | &#x05C2; Point Sin Dot               |\n| `U+05C3`  | Punctuation      | _0_        | &#x05C3; Punctuation Sof Pasuq       |\n| `U+05C4`  | Mark [Mn]        | 230        | &#x05C4; Mark Upper Dot              |\n| `U+05C5`  | Mark [Mn]        | 220        | &#x05C5; Mark Lower Dot              |\n| `U+05C6`  | Punctuation      | _0_        | &#x05C6; Punctuation Nun Hafuka      |\n| `U+05C7`  | Mark [Mn]        | 18         | &#x05C7; Point Qamats Qatan          |\n| `U+05C8`  | _unassigned_     |            |                                      |\n| `U+05C9`  | _unassigned_     |            |                                      |\n| `U+05CA`  | _unassigned_     |            |                                      |\n| `U+05CB`  | _unassigned_     |            |                                      |\n| `U+05CC`  | _unassigned_     |            |                                      |\n| `U+05CD`  | _unassigned_     |            |                                      |\n| `U+05CE`  | _unassigned_     |            |                                      |\n| `U+05CF`  | _unassigned_     |            |                                      |\n| | | | |\n| `U+05D0`  | Letter           | _0_        | &#x05D0; Alef                        |\n| `U+05D1`  | Letter           | _0_        | &#x05D1; Bet                         |\n| `U+05D2`  | Letter           | _0_        | &#x05D2; Gimel                       |\n| `U+05D3`  | Letter           | _0_        | &#x05D3; Dalet                       |\n| `U+05D4`  | Letter           | _0_        | &#x05D4; He                          |\n| `U+05D5`  | Letter           | _0_        | &#x05D5; Vav                         |\n| `U+05D6`  | Letter           | _0_        | &#x05D6; Zayin                       |\n| `U+05D7`  | Letter           | _0_        | &#x05D7; Het                         |\n| `U+05D8`  | Letter           | _0_        | &#x05D8; Tet                         |\n| `U+05D9`  | Letter           | _0_        | &#x05D9; Yod                         |\n| `U+05DA`  | Letter           | _0_        | &#x05DA; Final Kaf                   |\n| `U+05DB`  | Letter           | _0_        | &#x05DB; Kaf                         |\n| `U+05DC`  | Letter           | _0_        | &#x05DC; Lamed                       |\n| `U+05DD`  | Letter           | _0_        | &#x05DD; Final Mem                   |\n| `U+05DE`  | Letter           | _0_        | &#x05DE; Mem                         |\n| `U+05DF`  | Letter           | _0_        | &#x05DF; Final Nun                   |\n| | | | |\n| `U+05E0`  | Letter           | _0_        | &#x05E0; Nun                         |\n| `U+05E1`  | Letter           | _0_        | &#x05E1; Samekh                      |\n| `U+05E2`  | Letter           | _0_        | &#x05E2; Ayin                        |\n| `U+05E3`  | Letter           | _0_        | &#x05E3; Final Pe                    |\n| `U+05E4`  | Letter           | _0_        | &#x05E4; Pe                          |\n| `U+05E5`  | Letter           | _0_        | &#x05E5; Final Tsadi                 |\n| `U+05E6`  | Letter           | _0_        | &#x05E6; Tsadi                       |\n| `U+05E7`  | Letter           | _0_        | &#x05E7; Qof                         |\n| `U+05E8`  | Letter           | _0_        | &#x05E8; Resh                        |\n| `U+05E9`  | Letter           | _0_        | &#x05E9; Shin                        |\n| `U+05EA`  | Letter           | _0_        | &#x05EA; Tav                         |\n| `U+05EB`  | _unassigned_     |            |                                      |\n| `U+05EC`  | _unassigned_     |            |                                      |\n| `U+05ED`  | _unassigned_     |            |                                      |\n| `U+05EE`  | _unassigned_     |            |                                      |\n| `U+05EF`  | Letter           | _0_        | &#x05EF; Yod Triangle                |\n| | | | |\n| `U+05F0`  | Letter           | _0_        | &#x05F0; Ligature Yiddish Double Vav |\n| `U+05F1`  | Letter           | _0_        | &#x05F1; Ligature Yiddish Vav Yod    |\n| `U+05F2`  | Letter           | _0_        | &#x05F2; Ligature Yiddish Double Yod |\n| `U+05F3`  | Punctuation      | _0_        | &#x05F3; Punctuation Geresh          |\n| `U+05F4`  | Punctuation      | _0_        | &#x05F4; Punctuation Gershayim       |\n| `U+05F5`  | _unassigned_     |            |                                      |\n| `U+05F6`  | _unassigned_     |            |                                      |\n| `U+05F7`  | _unassigned_     |            |                                      |\n| `U+05F8`  | _unassigned_     |            |                                      |\n| `U+05F9`  | _unassigned_     |            |                                      |\n| `U+05FA`  | _unassigned_     |            |                                      |\n| `U+05FB`  | _unassigned_     |            |                                      |\n| `U+05FC`  | _unassigned_     |            |                                      |\n| `U+05FD`  | _unassigned_     |            |                                      |\n| `U+05FE`  | _unassigned_     |            |                                      |\n| `U+05FF`  | _unassigned_     |            |                                      |\n:::\n\n\n## Alphabetic Presentation Forms character table ##\n\nThis chart includes only the Hebrew codepoints from the Alphabetic\nPresentation Forms block in Unicode.\n\nThe _Composition_ column lists the codepoints from the Hebrew block\nthat compose into the listed Alphabetic Presentation Form. These\npresentation form compositions are not covered by the standard Unicode\ncomposition algorithm.\n\nEntries with a _null_ in this column do not need to be composed by the\nshaping engine. \n\n\n:::{table} Alphabetic Presentation Forms character table\n\n| Codepoint | Unicode category | Mark class | Composition     | Glyph                                   |\n|:----------|:-----------------|:-----------|:----------------|:----------------------------------------|\n| `U+FB1D`  | Letter           | _0_        |`U+05D9`,`U+05B4`| &#xFB1D; Yod With Hiriq                 |\n| `U+FB1E`  | Mark [Mn]        | 26         | _null_          | &#xFB1E; Point Juedo-Spanish Varika     |\n| `U+FB1F`  | Letter           | _0_        |`U+05F2`,`U+05B7`| &#xFB1F; Ligature Yiddish Yod Yod Patah |\n| | | | | |\n| `U+FB20`  | Letter           | _0_        | _null_          | &#xFB20; Alternative Ayin               |\n| `U+FB21`  | Letter           | _0_        | _null_          | &#xFB21; Wide Alef                      |\n| `U+FB22`  | Letter           | _0_        | _null_          | &#xFB22; Wide Dalet                     |\n| `U+FB23`  | Letter           | _0_        | _null_          | &#xFB23; Wide He                        |\n| `U+FB24`  | Letter           | _0_        | _null_          | &#xFB24; Wide Kaf                       |\n| `U+FB25`  | Letter           | _0_        | _null_          | &#xFB25; Wide Lamed                     |\n| `U+FB26`  | Letter           | _0_        | _null_          | &#xFB26; Wide Final Mem                 |\n| `U+FB27`  | Letter           | _0_        | _null_          | &#xFB27; Wide Resh                      |\n| `U+FB28`  | Letter           | _0_        | _null_          | &#xFB28; Wide Tav                       |\n| `U+FB29`  | Letter           | _0_        | _null_          | &#xFB29; Alternative Plus Sign          |\n| `U+FB2A`  | Letter           | _0_        |`U+05E9`,`U+05C1`| &#xFB2A; Shin With Shin Dot             |\n| `U+FB2B`  | Letter           | _0_        |`U+05E9`,`U+05C2`| &#xFB2B; Shin With Sin Dot              |\n| `U+FB2C`  | Letter           | _0_        |`U+FB2A`,`U+05BC` OR `U+FB49`,`U+05C1`| &#xFB2C; Shin With Dagesh And Shin Dot  |\n| `U+FB2D`  | Letter           | _0_        |`U+FB2B`,`U+05BC` OR `U+FB49`,`U+05C2`| &#xFB2D; Shin With Dagesh And Sin Dot   |\n| `U+FB2E`  | Letter           | _0_        |`U+05D0`,`U+05B7`| &#xFB2E; Alef With Patah                |\n| `U+FB2F`  | Letter           | _0_        |`U+05D0`,`U+05B8`| &#xFB2F; Alef With Qamats               |\n| | | | | |\n| `U+FB30`  | Letter           | _0_        |`U+05D0`,`U+05BC`| &#xFB30; Alef With Mapiq                |\n| `U+FB31`  | Letter           | _0_        |`U+05D1`,`U+05BC`| &#xFB31; Bet With Dagesh                |\n| `U+FB32`  | Letter           | _0_        |`U+05D2`,`U+05BC`| &#xFB32; Gimel With Dagesh              |\n| `U+FB33`  | Letter           | _0_        |`U+05D3`,`U+05BC`| &#xFB33; Dalet With Dagesh              |\n| `U+FB34`  | Letter           | _0_        |`U+05D4`,`U+05BC`| &#xFB34; He With Mapiq                  |\n| `U+FB35`  | Letter           | _0_        |`U+05D5`,`U+05BC`| &#xFB35; Vav With Dagesh                |\n| `U+FB36`  | Letter           | _0_        |`U+05D6`,`U+05BC`| &#xFB36; Zayin With Dagesh              |\n| `U+FB37`  | _unassigned_     |            |                 |                                         |\n| `U+FB38`  | Letter           | _0_        |`U+05D8`,`U+05BC`| &#xFB38; Tet With Dagesh                |\n| `U+FB39`  | Letter           | _0_        |`U+05D9`,`U+05BC`| &#xFB39; Yod With Dagesh                |\n| `U+FB3A`  | Letter           | _0_        |`U+05DA`,`U+05BC`| &#xFB3A; Final Kaf With Dagesh          |\n| `U+FB3B`  | Letter           | _0_        |`U+05DB`,`U+05BC`| &#xFB3B; Kaf With Dagesh                |\n| `U+FB3C`  | Letter           | _0_        |`U+05DC`,`U+05BC`| &#xFB3C; Lamed With Dagesh              |\n| `U+FB3D`  | _unassigned_     |            |                 |                                         |\n| `U+FB3E`  | Letter           | _0_        |`U+05DE`,`U+05BC`| &#xFB3E; Mem With Dagesh                |\n| `U+FB3F`  | _unassigned_     |            |                 |                                         |\n| | | | | |\t\t\t\t\t\t\t\t\t\t   \n| `U+FB40`  | Letter           | _0_        |`U+05E0`,`U+05BC`| &#xFB40; Nun With Dagesh                |\n| `U+FB41`  | Letter           | _0_        |`U+05E1`,`U+05BC`| &#xFB41; Samekh With Dagesh             |\n| `U+FB42`  | _unassigned_     |            |                 |                                         |\n| `U+FB43`  | Letter           | _0_        |`U+05E3`,`U+05BC`| &#xFB43; Final Pe With Dagesh           |\n| `U+FB44`  | Letter           | _0_        |`U+05E4`,`U+05BC`| &#xFB44; Pe With Dagesh                 |\n| `U+FB45`  | _unassigned_     |            |                 |                                         |\n| `U+FB46`  | Letter           | _0_        |`U+05E6`,`U+05BC`| &#xFB46; Tsadi With Dagesh              |\n| `U+FB47`  | Letter           | _0_        |`U+05E7`,`U+05BC`| &#xFB47; Qof With Dagesh                |\n| `U+FB48`  | Letter           | _0_        |`U+05E8`,`U+05BC`| &#xFB48; Resh With Dagesh               |\n| `U+FB49`  | Letter           | _0_        |`U+05E9`,`U+05BC`| &#xFB49; Shin With Dagesh               |\n| `U+FB4A`  | Letter           | _0_        |`U+05EA`,`U+05BC`| &#xFB4A; Tav With Dagesh                |\n| `U+FB4B`  | Letter           | _0_        |`U+05D5`,`U+05B9`| &#xFB4B; Vav With Holam                 |\n| `U+FB4C`  | Letter           | _0_        |`U+05D1`,`U+05BF`| &#xFB4C; Bet With Rafe                  |\n| `U+FB4D`  | Letter           | _0_        |`U+05DB`,`U+05BF`| &#xFB4D; Kaf With Rafe                  |\n| `U+FB4E`  | Letter           | _0_        |`U+05E4`,`U+05BF`| &#xFB4E; Pe With Rafe                   |\n| `U+FB4F`  | Letter           | _0_        | _null_          | &#xFB4F; Ligature Alef Lamed            |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Hebrew text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).\n\nThe dotted-circle placeholder is frequently used when displaying a\nmark in isolation. Real-world text may also use other characters, such\nas hyphens or dashes, in a similar placeholder fashion; shaping\nengines should cope with this situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Mark class | Glyph                              |\n|:----------|:-----------------|:-----------|:-----------------------------------|\n|`U+00A0`   | Separator        | _0_        | &#x00A0; No-break space            |\n|`U+034F`   | Other            | _0_        | &#x034F; Combining grapheme joiner |\n|`U+200C`   | Other            | _0_        | &#x200C; Zero-width non-joiner     |\n|`U+200D`   | Other            | _0_        | &#x200D; Zero-width joiner         |\n|`U+200E`   | Other            | _0_        | &#x200E; Left-to-Right marker      |\n|`U+200F`   | Other            | _0_        | &#x200F; Right-to-Left marker      |\n|`U+25CC`   | Symbol           | _0_        | &#x25CC; Dotted circle             |\n:::\n\n"
  },
  {
    "path": "character-tables/character-tables-kannada.md",
    "content": "# Kannada character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Kannada text](../opentype-shaping-kannada.md).\n\n**Contents**\n\n  - [Kannada character table](#kannada-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Kannada character table ##\n\nKannada glyphs should be classified as in the following\ntable. Codepoints in the Kannada block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Kannada character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0C80`   | Letter           | PLACEHOLDER       | _null_                     | &#x0C80; Spacing Candrabindu |\n|`U+0C81`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0C81; Candrabindu         |\n|`U+0C82`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0C82; Anusvara            |\n|`U+0C83`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0C83; Visarga             |\n|`U+0C84`   | Punctuation      | _null_            | _null_                     | &#x0C84; Siddham             |\n|`U+0C85`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C85; A                   |\n|`U+0C86`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C86; Aa                  |\n|`U+0C87`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C87; I                   |\n|`U+0C88`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C88; Ii                  |\n|`U+0C89`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C89; U                   |\n|`U+0C8A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C8A; Uu                  |\n|`U+0C8B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C8B; Vocalic R           |\n|`U+0C8C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C8C; Vocalic L           |\n|`U+0C8D`   | _unassigned_     |                   |                            |                              |\n|`U+0C8E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C8E; E                   |\n|`U+0C8F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C8F; Ee                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C90`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C90; Ai                  |\n|`U+0C91`   | _unassigned_     |                   |                            |                              |\n|`U+0C92`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C92; O                   |\n|`U+0C93`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C93; Oo                  |\n|`U+0C94`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C94; Au                  |\n|`U+0C95`   | Letter           | CONSONANT         | _null_                     | &#x0C95; Ka                  |\n|`U+0C96`   | Letter           | CONSONANT         | _null_                     | &#x0C96; Kha                 |\n|`U+0C97`   | Letter           | CONSONANT         | _null_                     | &#x0C97; Ga                  |\n|`U+0C98`   | Letter           | CONSONANT         | _null_                     | &#x0C98; Gha                 |\n|`U+0C99`   | Letter           | CONSONANT         | _null_                     | &#x0C99; Nga                 |\n|`U+0C9A`   | Letter           | CONSONANT         | _null_                     | &#x0C9A; Ca                  |\n|`U+0C9B`   | Letter           | CONSONANT         | _null_                     | &#x0C9B; Cha                 |\n|`U+0C9C`   | Letter           | CONSONANT         | _null_                     | &#x0C9C; Ja                  |\n|`U+0C9D`   | Letter           | CONSONANT         | _null_                     | &#x0C9D; Jha                 |\n|`U+0C9E`   | Letter           | CONSONANT         | _null_                     | &#x0C9E; Nya                 |\n|`U+0C9F`   | Letter           | CONSONANT         | _null_                     | &#x0C9F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CA0`   | Letter           | CONSONANT         | _null_                     | &#x0CA0; Ttha                |\n|`U+0CA1`   | Letter           | CONSONANT         | _null_                     | &#x0CA1; Dda                 |\n|`U+0CA2`   | Letter           | CONSONANT         | _null_                     | &#x0CA2; Ddha                |\n|`U+0CA3`   | Letter           | CONSONANT         | _null_                     | &#x0CA3; Nna                 |\n|`U+0CA4`   | Letter           | CONSONANT         | _null_                     | &#x0CA4; Ta                  |\n|`U+0CA5`   | Letter           | CONSONANT         | _null_                     | &#x0CA5; Tha                 |\n|`U+0CA6`   | Letter           | CONSONANT         | _null_                     | &#x0CA6; Da                  |\n|`U+0CA7`   | Letter           | CONSONANT         | _null_                     | &#x0CA7; Dha                 |\n|`U+0CA8`   | Letter           | CONSONANT         | _null_                     | &#x0CA8; Na                  |\n|`U+0CA9`   | _unassigned_     |                   |                            |                              |\n|`U+0CAA`   | Letter           | CONSONANT         | _null_                     | &#x0CAA; Pa                  |\n|`U+0CAB`   | Letter           | CONSONANT         | _null_                     | &#x0CAB; Pha                 |\n|`U+0CAC`   | Letter           | CONSONANT         | _null_                     | &#x0CAC; Ba                  |\n|`U+0CAD`   | Letter           | CONSONANT         | _null_                     | &#x0CAD; Bha                 |\n|`U+0CAE`   | Letter           | CONSONANT         | _null_                     | &#x0CAE; Ma                  |\n|`U+0CAF`   | Letter           | CONSONANT         | _null_                     | &#x0CAF; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CB0`   | Letter           | CONSONANT         | _null_                     | &#x0CB0; Ra                  |\n|`U+0CB1`   | Letter           | CONSONANT         | _null_                     | &#x0CB1; Rra                 |\n|`U+0CB2`   | Letter           | CONSONANT         | _null_                     | &#x0CB2; La                  |\n|`U+0CB3`   | Letter           | CONSONANT         | _null_                     | &#x0CB3; Lla                 |\n|`U+0CB4`   | _unassigned_     |                   |                            |                              |\n|`U+0CB5`   | Letter           | CONSONANT         | _null_                     | &#x0CB5; Va                  |\n|`U+0CB6`   | Letter           | CONSONANT         | _null_                     | &#x0CB6; Sha                 |\n|`U+0CB7`   | Letter           | CONSONANT         | _null_                     | &#x0CB7; Ssa                 |\n|`U+0CB8`   | Letter           | CONSONANT         | _null_                     | &#x0CB8; Sa                  |\n|`U+0CB9`   | Letter           | CONSONANT         | _null_                     | &#x0CB9; Ha                  |\n|`U+0CBA`   | _unassigned_     |                   |                            |                              |\n|`U+0CBB`   | _unassigned_     |                   |                            |                              |\n|`U+0CBC`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x0CBC; Nukta               |\n|`U+0CBD`   | Letter           | AVAGRAHA          | _null_                     | &#x0CBD; Avagraha            |\n|`U+0CBE`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CBE; Sign Aa             |\n|`U+0CBF`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0CBF; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CC0`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0CC0; Sign Ii             |\n|`U+0CC1`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CC1; Sign U              |\n|`U+0CC2`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CC2; Sign Uu             |\n|`U+0CC3`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CC3; Sign Vocalic R      |\n|`U+0CC4`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CC4; Sign Vocalic Rr     |\n|`U+0CC5`   | _unassigned_     |                   |                            |                              |\n|`U+0CC6`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0CC6; Sign E              |\n|`U+0CC7`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0CC7; Sign Ee             |\n|`U+0CC8`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0CC8; Sign Ai             |\n|`U+0CC9`   | _unassigned_     |                   |                            |                              |\n|`U+0CCA`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0CCA; Sign O              |\n|`U+0CCB`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0CCB; Sign Oo             |\n|`U+0CCC`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0CCC; Sign Au             |\n|`U+0CCD`   | Mark [Mn]        | VIRAMA            | TOP_POSITION               | &#x0CCD; Virama              |\n|`U+0CCE`   | _unassigned_     |                   |                            |                              |\n|`U+0CCF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CD0`   | _unassigned_     |                   |                            |                              |\n|`U+0CD1`   | _unassigned_     |                   |                            |                              |\n|`U+0CD2`   | _unassigned_     |                   |                            |                              |\n|`U+0CD3`   | _unassigned_     |                   |                            |                              |\n|`U+0CD4`   | _unassigned_     |                   |                            |                              |\n|`U+0CD5`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CD5; Length Mark         |\n|`U+0CD6`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0CD6; Ai Length Mark      |\n|`U+0CD7`   | _unassigned_     |                   |                            |                              |\n|`U+0CD8`   | _unassigned_     |                   |                            |                              |\n|`U+0CD9`   | _unassigned_     |                   |                            |                              |\n|`U+0CDA`   | _unassigned_     |                   |                            |                              |\n|`U+0CDB`   | _unassigned_     |                   |                            |                              |\n|`U+0CDC`   | _unassigned_     |                   |                            |                              |\n|`U+0CDD`   | _unassigned_     |                   |                            |                              |\n|`U+0CDE`   | Letter           | CONSONANT         | _null_                     | &#x0CDE; Fa                  |\n|`U+0CDF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CE0`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0CE0; Vocalic Rr          |\n|`U+0CE1`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0CE1; Vocalic Ll          |\n|`U+0CE2`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0CE2; Sign Vocalic L      |\n|`U+0CE3`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0CE3; Sign Vocalic Ll     |\n|`U+0CE4`   | _unassigned_     |                   |                            |                              |\n|`U+0CE5`   | _unassigned_     |                   |                            |                              |\n|`U+0CE6`   | Number           | NUMBER            | _null_                     | &#x0CE6; Digit Zero          |\n|`U+0CE7`   | Number           | NUMBER            | _null_                     | &#x0CE7; Digit One           |\n|`U+0CE8`   | Number           | NUMBER            | _null_                     | &#x0CE8; Digit Two           |\n|`U+0CE9`   | Number           | NUMBER            | _null_                     | &#x0CE9; Digit Three         |\n|`U+0CEA`   | Number           | NUMBER            | _null_                     | &#x0CEA; Digit Four          |\n|`U+0CEB`   | Number           | NUMBER            | _null_                     | &#x0CEB; Digit Five          |\n|`U+0CEC`   | Number           | NUMBER            | _null_                     | &#x0CEC; Digit Six           |\n|`U+0CED`   | Number           | NUMBER            | _null_                     | &#x0CED; Digit Seven         |\n|`U+0CEE`   | Number           | NUMBER            | _null_                     | &#x0CEE; Digit Eight         |\n|`U+0CEF`   | Number           | NUMBER            | _null_                     | &#x0CEF; Digit Nine          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0CF0`   | _unassigned_     |                   |                            |                              |\n|`U+0CF1`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x0CF1; Jihvamuliya         |\n|`U+0CF2`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x0CF2; Upadhmaniya         |\n|`U+0CF3`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0CF3; Combining Anusvara Above Right|\n|`U+0CF4`   | _unassigned_     |                   |                            |                              |\n|`U+0CF5`   | _unassigned_     |                   |                            |                              |\n|`U+0CF6`   | _unassigned_     |                   |                            |                              |\n|`U+0CF7`   | _unassigned_     |                   |                            |                              |\n|`U+0CF8`   | _unassigned_     |                   |                            |                              |\n|`U+0CF9`   | _unassigned_     |                   |                            |                              |\n|`U+0CFA`   | _unassigned_     |                   |                            |                              |\n|`U+0CFB`   | _unassigned_     |                   |                            |                              |\n|`U+0CFC`   | _unassigned_     |                   |                            |                              |\n|`U+0CFD`   | _unassigned_     |                   |                            |                              |\n|`U+0CFE`   | _unassigned_     |                   |                            |                              |\n|`U+0CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Kannada script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Kannada text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Kannada text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Kannada text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n"
  },
  {
    "path": "character-tables/character-tables-khmer.md",
    "content": "# Khmer character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Khmer text](../opentype-shaping-khmer.md).\n\n**Contents**\n\n  - [Khmer character table](#khmer-character-table)\n  - [Khmer Symbols character table](#khmer-symbols-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Khmer character table ##\n\nKhmer glyphs should be classified as in the following\ntable. Codepoints in the Khmer block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Khmer character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1780`   | Letter           | CONSONANT         | _null_                     | &#x1780; Ka                  |\n|`U+1781`   | Letter           | CONSONANT         | _null_                     | &#x1781; Kha                 |\n|`U+1782`   | Letter           | CONSONANT         | _null_                     | &#x1782; Ko                  |\n|`U+1783`   | Letter           | CONSONANT         | _null_                     | &#x1783; Kho                 |\n|`U+1784`   | Letter           | CONSONANT         | _null_                     | &#x1784; Ngo                 |\n|`U+1785`   | Letter           | CONSONANT         | _null_                     | &#x1785; Ca                  |\n|`U+1786`   | Letter           | CONSONANT         | _null_                     | &#x1786; Cha                 |\n|`U+1787`   | Letter           | CONSONANT         | _null_                     | &#x1787; Co                  |\n|`U+1788`   | Letter           | CONSONANT         | _null_                     | &#x1788; Cho                 |\n|`U+1789`   | Letter           | CONSONANT         | _null_                     | &#x1789; Nyo                 |\n|`U+178A`   | Letter           | CONSONANT         | _null_                     | &#x178A; Da                  |\n|`U+178B`   | Letter           | CONSONANT         | _null_                     | &#x178B; Ttha                |\n|`U+178C`   | Letter           | CONSONANT         | _null_                     | &#x178C; Do                  |\n|`U+178D`   | Letter           | CONSONANT         | _null_                     | &#x178D; Ttho                |\n|`U+178E`   | Letter           | CONSONANT         | _null_                     | &#x178E; Nno                 |\n|`U+178F`   | Letter           | CONSONANT         | _null_                     | &#x178F; Ta                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+1790`   | Letter           | CONSONANT         | _null_                     | &#x1790; Tha                 |\n|`U+1791`   | Letter           | CONSONANT         | _null_                     | &#x1791; To                  |\n|`U+1792`   | Letter           | CONSONANT         | _null_                     | &#x1792; Tho                 |\n|`U+1793`   | Letter           | CONSONANT         | _null_                     | &#x1793; No                  |\n|`U+1794`   | Letter           | CONSONANT         | _null_                     | &#x1794; Ba                  |\n|`U+1795`   | Letter           | CONSONANT         | _null_                     | &#x1795; Pha                 |\n|`U+1796`   | Letter           | CONSONANT         | _null_                     | &#x1796; Po                  |\n|`U+1797`   | Letter           | CONSONANT         | _null_                     | &#x1797; Pho                 |\n|`U+1798`   | Letter           | CONSONANT         | _null_                     | &#x1798; Mo                  |\n|`U+1799`   | Letter           | CONSONANT         | _null_                     | &#x1799; Yo                  |\n|`U+179A`   | Letter           | CONSONANT         | _null_                     | &#x179A; Ro                  |\n|`U+179B`   | Letter           | CONSONANT         | _null_                     | &#x179B; Lo                  |\n|`U+179C`   | Letter           | CONSONANT         | _null_                     | &#x179C; Vo                  |\n|`U+179D`   | Letter           | CONSONANT         | _null_                     | &#x179D; Sha                 |\n|`U+179E`   | Letter           | CONSONANT         | _null_                     | &#x179E; Sso                 |\n|`U+179F`   | Letter           | CONSONANT         | _null_                     | &#x179F; Sa                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+17A0`   | Letter           | CONSONANT         | _null_                     | &#x17A0; Ha                  |\n|`U+17A1`   | Letter           | CONSONANT         | _null_                     | &#x17A1; La                  |\n|`U+17A2`   | Letter           | CONSONANT         | _null_                     | &#x17A2; Qa                  |\n|`U+17A3`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A3; Qaq                 |\n|`U+17A4`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A4; Qaa                 |\n|`U+17A5`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A5; Qi                  |\n|`U+17A6`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A6; Qii                 |\n|`U+17A7`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A7; Qu                  |\n|`U+17A8`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A8; Quk                 |\n|`U+17A9`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17A9; Quu                 |\n|`U+17AA`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AA; Quuv                |\n|`U+17AB`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AB; Ry                  |\n|`U+17AC`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AC; Ryy                 |\n|`U+17AD`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AD; Ly                  |\n|`U+17AE`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AE; Lyy                 |\n|`U+17AF`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17AF; Qe                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t    \n|`U+17B0`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17B0; Qai                 |\n|`U+17B1`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17B1; Qoo Type One        |\n|`U+17B2`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17B2; Qoo Type Two        |\n|`U+17B3`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x17B3; Qau                 |\n|`U+17B4`   | Mark [Mn]        | _null_            | _null_                     | &#x17B4; Inherent Aq         |\n|`U+17B5`   | Mark [Mn]        | _null_            | _null_                     | &#X17B5; Inherent Aa         |\n|`U+17B6`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x17B6; Sign Aa             |\n|`U+17B7`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x17B7; Sign I              |\n|`U+17B8`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x17B8; Sign Ii             |\n|`U+17B9`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x17B9; Sign Y              |\n|`U+17BA`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x17BA; Sign Yy             |\n|`U+17BB`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x17BB; Sign U              |\n|`U+17BC`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x17BC; Sign Uu             |\n|`U+17BD`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x17BD; Sign Ua             |\n|`U+17BE`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_LEFT_POSITION      | &#x17BE; Sign Oe             |\n|`U+17BF`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_LEFT_AND_RIGHT_POSITION| &#x17BF; Sign Ya             |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+17C0`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x17C0; Sign Ie             |\n|`U+17C1`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x17C1; Sign E              |\n|`U+17C2`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x17C2; Sign Ae             |\n|`U+17C3`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x17C3; Sign Ai             |\n|`U+17C4`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x17C4; Sign Oo             |\n|`U+17C5`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x17C5; Sign Au             |\n|`U+17C6`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x17C6; Nikahit             |\n|`U+17C7`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x17C7; Reahmuk             |\n|`U+17C8`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x17C8; Yuukaleapintu       |\n|`U+17C9`   | Mark [Mn]        | REGISTER_SHIFTER  | TOP_POSITION               | &#x17C9; Muusikatoan         |\n|`U+17CA`   | Mark [Mn]        | REGISTER_SHIFTER  | TOP_POSITION               | &#x17CA; Triisap             |\n|`U+17CB`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17CB; Bantoc              |\n|`U+17CC`   | Mark [Mn]        | CONSONANT_POST_REPHA| TOP_POSITION             | &#x17CC; Robat               |\n|`U+17CD`   | Mark [Mn]        | CONSONANT_KILLER  | TOP_POSITION               | &#x17CD; Toandakhiat         |\n|`U+17CE`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17CE; Kakabat             |\n|`U+17CF`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17CF; Ahsda               |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \n|`U+17D0`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17D0; Samyok Sannya       |\n|`U+17D1`   | Mark [Mn]        | PURE_KILLER       | TOP_POSITION               | &#x17D1; Viriam              |\n|`U+17D2`   | Mark [Mn]        | INVISIBLE_STACKER | _null_                     | &#x17D2; Sign Coeng          |\n|`U+17D3`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17D3; Bathamasat          |\n|`U+17D4`   | Punctuation      | _null_            | _null_                     | &#x17D4; Khan                |\n|`U+17D5`   | Punctuation      | _null_            | _null_                     | &#x17D5; Bariyoosan          |\n|`U+17D6`   | Punctuation      | _null_            | _null_                     | &#x17D6; Camnuc Pii Kuuh     |\n|`U+17D7`   | Letter           | _null_            | _null_                     | &#x17D7; Lek Too             |\n|`U+17D8`   | Punctuation      | _null_            | _null_                     | &#x17D8; Beyyal              |\n|`U+17D9`   | Punctuation      | _null_            | _null_                     | &#x17D9; Phnaek Muan         |\n|`U+17DA`   | Punctuation      | _null_            | _null_                     | &#x17DA; Koomuut             |\n|`U+17DB`   | Symbol           | SYMBOL            | _null_                     | &#x17DB; Riel                |\n|`U+17DC`   | Letter           | AVAGRAHA          | _null_                     | &#x17DC; Avakrahasanya       |\n|`U+17DD`   | Mark [Mn]        | SYLLABLE_MODIFIER | TOP_POSITION               | &#x17DD; Atthacan            |\n|`U+17DE`   | _unassigned_     |                   |                            |                              |\n|`U+17DF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t   \t  \n|`U+17E0`   | Number           | NUMBER            | _null_                     | &#x17E0; Digit Zero          |\n|`U+17E1`   | Number           | NUMBER            | _null_                     | &#x17E1; Digit One           |\n|`U+17E2`   | Number           | NUMBER            | _null_                     | &#x17E2; Digit Two           |\n|`U+17E3`   | Number           | NUMBER            | _null_                     | &#x17E3; Digit Three         |\n|`U+17E4`   | Number           | NUMBER            | _null_                     | &#x17E4; Digit Four          |\n|`U+17E5`   | Number           | NUMBER            | _null_                     | &#x17E5; Digit Five          |\n|`U+17E6`   | Number           | NUMBER            | _null_                     | &#x17E6; Digit Six           |\n|`U+17E7`   | Number           | NUMBER            | _null_                     | &#x17E7; Digit Seven         |\n|`U+17E8`   | Number           | NUMBER            | _null_                     | &#x17E8; Digit Eight         |\n|`U+17E9`   | Number           | NUMBER            | _null_                     | &#x17E9; Digit Nine          |\n|`U+17EA`   | _unassigned_     |                   |                            |                              |\n|`U+17EB`   | _unassigned_     |                   |                            |                              |\n|`U+17EC`   | _unassigned_     |                   |                            |                              |\n|`U+17ED`   | _unassigned_     |                   |                            |                              |\n|`U+17EE`   | _unassigned_     |                   |                            |                              |\n|`U+17EF`   | _unassigned_     |                   |                            |                              |\n| | | | |\n|`U+17F0`   | Number           | _null_            | _null_                     | &#x17F0; Lek Attak Son       |\n|`U+17F1`   | Number           | _null_            | _null_                     | &#x17F1; Lek Attak Muoy      |\n|`U+17F2`   | Number           | _null_            | _null_                     | &#x17F2; Lek Attak Pii       |\n|`U+17F3`   | Number           | _null_            | _null_                     | &#x17F3; Lek Attak Bei       |\n|`U+17F4`   | Number           | _null_            | _null_                     | &#x17F4; Lek Attak Buon      |\n|`U+17F5`   | Number           | _null_            | _null_                     | &#x17F5; Lek Attak Pram      |\n|`U+17F6`   | Number           | _null_            | _null_                     | &#x17F6; Lek Attak Pram-Muoy |\n|`U+17F7`   | Number           | _null_            | _null_                     | &#x17F7; Lek Attak Pram-Pii  |\n|`U+17F8`   | Number           | _null_            | _null_                     | &#x17F8; Lek Attak Pram-Bei  |\n|`U+17F9`   | Number           | _null_            | _null_                     | &#x17F9; Lek Attak Pram-Buon |\n|`U+17FA`   | _unassigned_     |                   |                            |                              |\n|`U+17FB`   | _unassigned_     |                   |                            |                              |\n|`U+17FC`   | _unassigned_     |                   |                            |                              |\n|`U+17FD`   | _unassigned_     |                   |                            |                              |\n|`U+17FE`   | _unassigned_     |                   |                            |                              |\n|`U+17FF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Khmer Symbols character table ##\n\nThe Khmer Symbols block contains miscellaneous symbols used for\nlunar-date calendars. None evoke any special behavior from the shaping engine.\n\n\n:::{table} Khmer Symbols character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+19E0`   | Symbol           | _null_            | _null_                     | &#x19E0; Pathamasat          |\n|`U+19E1`   | Symbol           | _null_            | _null_                     | &#x19E1; Muoy Koet           |\n|`U+19E2`   | Symbol           | _null_            | _null_                     | &#x19E2; Pii Koet            |\n|`U+19E3`   | Symbol           | _null_            | _null_                     | &#x19E3; Bei Koet            |\n|`U+19E4`   | Symbol           | _null_            | _null_                     | &#x19E4; Buon Koet           |\n|`U+19E5`   | Symbol           | _null_            | _null_                     | &#x19E5; Pram Koet           |\n|`U+19E6`   | Symbol           | _null_            | _null_                     | &#x19E6; Pram-Muoy Koet      |\n|`U+19E7`   | Symbol           | _null_            | _null_                     | &#x19E7; Pram-Pii Koet       |\n|`U+19E8`   | Symbol           | _null_            | _null_                     | &#x19E8; Pram-Bei Koet       |\n|`U+19E9`   | Symbol           | _null_            | _null_                     | &#x19E9; Pram-Buon Koet      |\n|`U+19EA`   | Symbol           | _null_            | _null_                     | &#x19EA; Dap Koet            |\n|`U+19EB`   | Symbol           | _null_            | _null_                     | &#x19EB; Dap-Muoy Koet       |\n|`U+19EC`   | Symbol           | _null_            | _null_                     | &#x19EC; Dap-Pii Koet        |\n|`U+19ED`   | Symbol           | _null_            | _null_                     | &#x19ED; Dap-Bei Koet        |\n|`U+19EE`   | Symbol           | _null_            | _null_                     | &#x19EE; Dap-Buon Koet       |\n|`U+19EF`   | Symbol           | _null_            | _null_                     | &#x19EF; Dap-Pram Koet       |\n| | | | |\n|`U+19F0`   | Symbol           | _null_            | _null_                     | &#x19F0; Tuteyasat           |\n|`U+19F1`   | Symbol           | _null_            | _null_                     | &#x19F1; Muoy ROC            |\n|`U+19F2`   | Symbol           | _null_            | _null_                     | &#x19F2; Pii Roc             |\n|`U+19F3`   | Symbol           | _null_            | _null_                     | &#x19F3; Bei Roc             |\n|`U+19F4`   | Symbol           | _null_            | _null_                     | &#x19F4; Buon Roc            |\n|`U+19F5`   | Symbol           | _null_            | _null_                     | &#x19F5; Pram Roc            |\n|`U+19F6`   | Symbol           | _null_            | _null_                     | &#x19F6; Pram-Muoy Roc       |\n|`U+19F7`   | Symbol           | _null_            | _null_                     | &#x19F7; Pram-Pii Roc        |\n|`U+19F8`   | Symbol           | _null_            | _null_                     | &#x19F8; Pram-Bei Roc        |\n|`U+19F9`   | Symbol           | _null_            | _null_                     | &#x19F9; Pram-Buon Roc       |\n|`U+19FA`   | Symbol           | _null_            | _null_                     | &#x19FA; Dap Roc             |\n|`U+19FB`   | Symbol           | _null_            | _null_                     | &#x19FB; Dap-Muoy Roc        |\n|`U+19FC`   | Symbol           | _null_            | _null_                     | &#x19FC; Dap-Pii Roc         |\n|`U+19FD`   | Symbol           | _null_            | _null_                     | &#x19FD; Dap-Bei Roc         |\n|`U+19FE`   | Symbol           | _null_            | _null_                     | &#x19FE; Dap-Buon Roc        |\n|`U+19FF`   | Symbol           | _null_            | _null_                     | &#x19FF; Dap-Pram Roc        |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Khmer text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation of a\nconjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The sequence\n\"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of a\nconjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The sequence\n\"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the first\nconsonant in its standard form, followed by an explicit \"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP<.abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or\n\"NBSP,_matra_\".\n\nIn addition to general punctuation, runs of Khmer text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n"
  },
  {
    "path": "character-tables/character-tables-lao.md",
    "content": "# Lao character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Lao text](../opentype-shaping-thai-lao.md#the-thailao-shaping-model).\n\n**Contents**\n\n  - [Lao character table](#lao-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Lao character table ##\n\nLao glyphs should be classified as in the following\ntable. Codepoints in the Lao block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Lao character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass | Combining class | PUA    | Glyph                         |\n|:----------|:-----------------|:------------------|:------------------------|:----------------|:-------|:------------------------------|\n|`U+0E80`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E81`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E81; Ko                   |\n|`U+0E82`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E82; Kho Sung             |\n|`U+0E83`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E84`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E84; Kho Tam              |\n|`U+0E85`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E86`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E86; Pali Gha             |\n|`U+0E87`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E87; Ngo                  |\n|`U+0E88`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E88; Co                   |\n|`U+0E89`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E89; Pali Cha             |\n|`U+0E8A`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E8A; So Tam               |\n|`U+0E8B`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E8C`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E8C; Pali Jha             |\n|`U+0E8D`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E8D; Nyo                  |\n|`U+0E8E`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E8E; Pali Nya             |\n|`U+0E8F`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E8F; Pali Tta             |\n| | | | | | | |\n|`U+0E90`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E90; Pali Ttha            |\n|`U+0E91`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E91; Pali Dda             |\n|`U+0E92`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E92; Pali Ddha            |\n|`U+0E93`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E93; Pali Nna             |\n|`U+0E94`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E94; Do                   |\n|`U+0E95`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E95; To                   |\n|`U+0E96`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E96; Tho Sung             |\n|`U+0E97`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E97; Tho Tam              |\n|`U+0E98`   | Letter           | CONSONANT         |  _null_                 | _0_             | _null_ | &#x0E98; Pali Dha             |\n|`U+0E99`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E99; No                   |\n|`U+0E9A`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9A; Bo                   |\n|`U+0E9B`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9B; Po                   |\n|`U+0E9C`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9C; Pho Sung             |\n|`U+0E9D`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9D; Fo Tam               |\n|`U+0E9E`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9E; Pho Tam              |\n|`U+0E9F`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E9F; Fo Sung              |\n| | | | | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t      \n|`U+0EA0`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA0; Pali Bha             |\n|`U+0EA1`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA1; Mo                   |\n|`U+0EA2`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA2; Yo                   |\n|`U+0EA3`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA3; Lo Ling              |\n|`U+0EA4`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EA5`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA5; Lo Loot              |\n|`U+0EA6`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EA7`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA7; Wo                   |\n|`U+0EA8`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA8; Sanskrit Sha         |\n|`U+0EA9`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EA9; Sanskrit Ssa         |\n|`U+0EAA`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EAA; So Sung              |\n|`U+0EAB`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EAB; Ho Sung              |\n|`U+0EAC`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EAC; Pali Lla             |\n|`U+0EAD`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EAD; O                    |\n|`U+0EAE`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EAE; Ho Tam               |\n|`U+0EAF`   | Letter           | _null_            | _null_                  | _0_             | _null_ | &#x0EAF; Ellipsis             |\n| | | | | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t      \n|`U+0EB0`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | _null_ | &#x0EB0; Sign A               |\n|`U+0EB1`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EB1; Sign Mai Kan         |\n|`U+0EB2`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | _null_ | &#x0EB2; Sign Aa              |\n|`U+0EB3`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | _null_ | &#x0EB3; Sign Am              |\n|`U+0EB4`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EB4; Sign I               |\n|`U+0EB5`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EB5; Sign Ii              |\n|`U+0EB6`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EB6; Sign Y               |\n|`U+0EB7`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EB7; Sign Yy              |\n|`U+0EB8`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION         | 118             | _null_ | &#x0EB8; Sign U               |\n|`U+0EB9`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION         | 118             | _null_ | &#x0EB9; Sign Uu              |\n|`U+0EBA`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION         | 9               | _null_ | &#x0EBA; Pali Virama          |\n|`U+0EBB`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | _null_ | &#x0EBB; Sign Mai Kon         |\n|`U+0EBC`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION         | _0_             | _null_ | &#x0EBC; Semivowel Sign Lo    |\n|`U+0EBD`   | Letter           | CONSONANT_MEDIAL  | _null_                  | _0_             | _null_ | &#x0EBD; Semivowel Sign Nyo   |\n|`U+0EBE`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EBF`   | _unassigned_     |                   |                         |                 |        |                               |\n| | | | | | | |\n|`U+0EC0`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | _null_ | &#x0EC0; Sign E               |\n|`U+0EC1`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | _null_ | &#x0EC1; Sign Ei              |\n|`U+0EC2`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | _null_ | &#x0EC2; Sign O               |\n|`U+0EC3`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | _null_ | &#x0EC3; Sign Ay              |\n|`U+0EC4`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | _null_ | &#x0EC4; Sign Ai              |\n|`U+0EC5`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EC6`   | Letter Modifier  | _null_            | _null_                  | _0_             | _null_ | &#x0EC6; Ko La                |\n|`U+0EC7`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EC8`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 122             | _null_ | &#x0EC8; Tone Mai Ek          |\n|`U+0EC9`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 122             | _null_ | &#x0EC9; Tone Mai Tho         |\n|`U+0ECA`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 122             | _null_ | &#x0ECA; Tone Mai Ti          |\n|`U+0ECB`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 122             | _null_ | &#x0ECB; Tone Mai Catawa      |\n|`U+0ECC`   | Mark [Mn]        | _null_            | TOP_POSITION            | _0_             | _null_ | &#x0ECC; Cancellation mark    |\n|`U+0ECD`   | Mark [Mn]        | BINDU             | TOP_POSITION            | _0_             | _null_ | &#x0ECD; Niggahita            |\n|`U+0ECE`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | _0_             | _null_ | &#x0ECE; Yamakkan             |\n|`U+0ECF`   | _unassigned_     |                   |                         |                 |        |                               |\n| | | | | | | |        \t\t\t\t\t\t\t\t\t\t\t\t\t\t                    \n|`U+0ED0`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED0; Digit Zero           |\n|`U+0ED1`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED1; Digit One            |\n|`U+0ED2`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED2; Digit Two            |\n|`U+0ED3`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED3; Digit Three          |\n|`U+0ED4`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED4; Digit Four           |\n|`U+0ED5`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED5; Digit Five           |\n|`U+0ED6`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED6; Digit Six            |\n|`U+0ED7`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED7; Digit Seven          |\n|`U+0ED8`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED8; Digit Eight          |\n|`U+0ED9`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0ED9; Digit Nine           |\n|`U+0EDA`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EDB`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EDC`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EDC; Ho No                |\n|`U+0EDD`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EDD; Ho Mo                |\n|`U+0EDE`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EDE; Khmu Go              |\n|`U+0EDF`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0EDF; Khmu Nyo             |\n| | | | | | | |\n|`U+0EE0`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE1`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE2`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE3`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE4`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE5`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE6`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE7`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE8`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EE9`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EEA`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EEB`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EEC`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EED`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EEE`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EEF`   | _unassigned_     |                   |                         |                 |        |                               |\n| | | | | | | |\n|`U+0EF0`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF1`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF2`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF3`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF4`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF5`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF6`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF7`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF8`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EF9`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFA`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFB`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFC`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFD`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFE`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0EFF`   | _unassigned_     |                   |                         |                 |        |                               |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Lao text text typically do not\ninsert spaces between words. Consequently, the Zero-Width Space (`U+200B`)\ncharacter is often used to insert invisible break points that may be\nconverted to line breaks.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+200B`   | Separator        | PLACEHOLDER       | _null_                     | &#x200B; Zero-width space      |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Lao text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n"
  },
  {
    "path": "character-tables/character-tables-malayalam.md",
    "content": "# Malayalam character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Malayalam text](../opentype-shaping-malayalam.md).\n\n**Contents**\n\n  - [Malayalam character table](#malayalam-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Malayalam character table ##\n\nMalayalam glyphs should be classified as in the following\ntable. Codepoints in the Malayalam block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Malayalam character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0D00`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0D00; Combining Anusvara Above |\n|`U+0D01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0D01; Candrabindu         |\n|`U+0D02`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0D02; Anusvara            |\n|`U+0D03`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0D03; Visarga             |\n|`U+0D04`   | Letter           | BINDU             | _null_                     | &#x0D04; Vedic Anusvara      |\n|`U+0D05`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D05; A                   |\n|`U+0D06`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D06; Aa                  |\n|`U+0D07`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D07; I                   |\n|`U+0D08`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D08; Ii                  |\n|`U+0D09`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D09; U                   |\n|`U+0D0A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D0A; Uu                  |\n|`U+0D0B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D0B; Vocalic R           |\n|`U+0D0C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D0C; Vocalic L           |\n|`U+0D0D`   | _unassigned_     |                   |                            |                              |\n|`U+0D0E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D0E; E                   |\n|`U+0D0F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D0F; Ee                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D10`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D10; Ai                  |\n|`U+0D11`   | _unassigned_     |                   |                            |                              |\n|`U+0D12`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D12; O                   |\n|`U+0D13`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D13; Oo                  |\n|`U+0D14`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D14; Au                  |\n|`U+0D15`   | Letter           | CONSONANT         | _null_                     | &#x0D15; Ka                  |\n|`U+0D16`   | Letter           | CONSONANT         | _null_                     | &#x0D16; Kha                 |\n|`U+0D17`   | Letter           | CONSONANT         | _null_                     | &#x0D17; Ga                  |\n|`U+0D18`   | Letter           | CONSONANT         | _null_                     | &#x0D18; Gha                 |\n|`U+0D19`   | Letter           | CONSONANT         | _null_                     | &#x0D19; Nga                 |\n|`U+0D1A`   | Letter           | CONSONANT         | _null_                     | &#x0D1A; Ca                  |\n|`U+0D1B`   | Letter           | CONSONANT         | _null_                     | &#x0D1B; Cha                 |\n|`U+0D1C`   | Letter           | CONSONANT         | _null_                     | &#x0D1C; Ja                  |\n|`U+0D1D`   | Letter           | CONSONANT         | _null_                     | &#x0D1D; Jha                 |\n|`U+0D1E`   | Letter           | CONSONANT         | _null_                     | &#x0D1E; Nya                 |\n|`U+0D1F`   | Letter           | CONSONANT         | _null_                     | &#x0D1F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D20`   | Letter           | CONSONANT         | _null_                     | &#x0D20; Ttha                |\n|`U+0D21`   | Letter           | CONSONANT         | _null_                     | &#x0D21; Dda                 |\n|`U+0D22`   | Letter           | CONSONANT         | _null_                     | &#x0D22; Ddha                |\n|`U+0D23`   | Letter           | CONSONANT         | _null_                     | &#x0D23; Nna                 |\n|`U+0D24`   | Letter           | CONSONANT         | _null_                     | &#x0D24; Ta                  |\n|`U+0D25`   | Letter           | CONSONANT         | _null_                     | &#x0D25; Tha                 |\n|`U+0D26`   | Letter           | CONSONANT         | _null_                     | &#x0D26; Da                  |\n|`U+0D27`   | Letter           | CONSONANT         | _null_                     | &#x0D27; Dha                 |\n|`U+0D28`   | Letter           | CONSONANT         | _null_                     | &#x0D28; Na                  |\n|`U+0D29`   | Letter           | CONSONANT         | _null_                     | &#x0D29; Nnna                |\n|`U+0D2A`   | Letter           | CONSONANT         | _null_                     | &#x0D2A; Pa                  |\n|`U+0D2B`   | Letter           | CONSONANT         | _null_                     | &#x0D2B; Pha                 |\n|`U+0D2C`   | Letter           | CONSONANT         | _null_                     | &#x0D2C; Ba                  |\n|`U+0D2D`   | Letter           | CONSONANT         | _null_                     | &#x0D2D; Bha                 |\n|`U+0D2E`   | Letter           | CONSONANT         | _null_                     | &#x0D2E; Ma                  |\n|`U+0D2F`   | Letter           | CONSONANT         | _null_                     | &#x0D2F; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D30`   | Letter           | CONSONANT         | _null_                     | &#x0D30; Ra                  |\n|`U+0D31`   | Letter           | CONSONANT         | _null_                     | &#x0D31; Rra                 |\n|`U+0D32`   | Letter           | CONSONANT         | _null_                     | &#x0D32; La                  |\n|`U+0D33`   | Letter           | CONSONANT         | _null_                     | &#x0D33; Lla                 |\n|`U+0D34`   | Letter           | CONSONANT         | _null_                     | &#x0D34; Llla                |\n|`U+0D35`   | Letter           | CONSONANT         | _null_                     | &#x0D35; Va                  |\n|`U+0D36`   | Letter           | CONSONANT         | _null_                     | &#x0D36; Sha                 |\n|`U+0D37`   | Letter           | CONSONANT         | _null_                     | &#x0D37; Ssa                 |\n|`U+0D38`   | Letter           | CONSONANT         | _null_                     | &#x0D38; Sa                  |\n|`U+0D39`   | Letter           | CONSONANT         | _null_                     | &#x0D39; Ha                  |\n|`U+0D3A`   | Letter           | CONSONANT         | _null_                     | &#x0D3A; Ttta                |\n|`U+0D3B`   | Mark [Mn]        | PURE_KILLER       | TOP_POSITION               | &#x0D3B; Vertical Bar Virama |\n|`U+0D3C`   | Mark [Mn]        | PURE_KILLER       | TOP_POSITION               | &#x0D3C; Circular Virama     |\n|`U+0D3D`   | Letter           | AVAGRAHA          | _null_                     | &#x0D3D; Avagraha            |\n|`U+0D3E`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D3E; Sign Aa             |\n|`U+0D3F`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D3F; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D40`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D40; Sign Ii             |\n|`U+0D41`   | Mark [Mn]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D41; Sign U              |\n|`U+0D42`   | Mark [Mn]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D42; Sign Uu             |\n|`U+0D43`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0D43; Sign Vocalic R      |\n|`U+0D44`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0D44; Sign Vocalic Rr     |\n|`U+0D45`   | _unassigned_     |                   |                            |                              |\n|`U+0D46`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0D46; Sign E              |\n|`U+0D47`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0D47; Sign Ee             |\n|`U+0D48`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0D48; Sign Ai             |\n|`U+0D49`   | _unassigned_     |                   |                            |                              |\n|`U+0D4A`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0D4A; Sign O              |\n|`U+0D4B`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0D4B; Sign Oo             |\n|`U+0D4C`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0D4C; Sign Au             |\n|`U+0D4D`   | Mark [Mn]        | VIRAMA            | TOP_POSITION               | &#x0D4D; Virama              |\n|`U+0D4E`   | Letter           | CONSONANT_PRE_REPHA| _null_                    | &#x0D4E; Dot Reph            |\n|`U+0D4F`   | Symbol           | SYMBOL            | _null_                     | &#x0D4F; Para                |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D50`   | _unassigned_     |                   |                            |                              |\n|`U+0D51`   | _unassigned_     |                   |                            |                              |\n|`U+0D52`   | _unassigned_     |                   |                            |                              |\n|`U+0D53`   | _unassigned_     |                   |                            |                              |\n|`U+0D54`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D54; Chillu M            |\n|`U+0D55`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D55; Chillu Y            |\n|`U+0D56`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D56; Chillu Lll          |\n|`U+0D57`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0D57; Au Length Mark      |\n|`U+0D58`   | Number           | NUMBER            | _null_                     | &#x0D58; Fraction 1/160      |\n|`U+0D59`   | Number           | NUMBER            | _null_                     | &#x0D59; Fraction 1/40       |\n|`U+0D5A`   | Number           | NUMBER            | _null_                     | &#x0D5A; Fraction 3/80       |\n|`U+0D5B`   | Number           | NUMBER            | _null_                     | &#x0D5B; Fraction 1/20       |\n|`U+0D5C`   | Number           | NUMBER            | _null_                     | &#x0D5C; Fraction 1/10       |\n|`U+0D5D`   | Number           | NUMBER            | _null_                     | &#x0D5D; Fraction 3/20       |\n|`U+0D5E`   | Number           | NUMBER            | _null_                     | &#x0D5E; Fraction 1/5        |\n|`U+0D5F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D5F; Archaic Ii          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D60`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D60; Vocalic Rr          |\n|`U+0D61`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D61; Vocalic Ll          |\n|`U+0D62`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0D62; Sign Vocalic L      |\n|`U+0D63`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0D63; Sign Vocalic Ll     |\n|`U+0D64`   | _unassigned_     |                   |                            |                              |\n|`U+0D65`   | _unassigned_     |                   |                            |                              |\n|`U+0D66`   | Number           | NUMBER            | _null_                     | &#x0D66; Digit Zero          |\n|`U+0D67`   | Number           | NUMBER            | _null_                     | &#x0D67; Digit One           |\n|`U+0D68`   | Number           | NUMBER            | _null_                     | &#x0D68; Digit Two           |\n|`U+0D69`   | Number           | NUMBER            | _null_                     | &#x0D69; Digit Three         |\n|`U+0D6A`   | Number           | NUMBER            | _null_                     | &#x0D6A; Digit Four          |\n|`U+0D6B`   | Number           | NUMBER            | _null_                     | &#x0D6B; Digit Five          |\n|`U+0D6C`   | Number           | NUMBER            | _null_                     | &#x0D6C; Digit Six           |\n|`U+0D6D`   | Number           | NUMBER            | _null_                     | &#x0D6D; Digit Seven         |\n|`U+0D6E`   | Number           | NUMBER            | _null_                     | &#x0D6E; Digit Eight         |\n|`U+0D6F`   | Number           | NUMBER            | _null_                     | &#x0D6F; Digit Nine          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D70`   | Number           | NUMBER            |                            | &#x0D70; Number Ten          |\n|`U+0D71`   | Number           | NUMBER            |                            | &#x0D71; Number One Hundred  |\n|`U+0D72`   | Number           | NUMBER            |                            | &#x0D72; Number One Thousand |\n|`U+0D73`   | Number           | NUMBER            |                            | &#x0D73; Fraction 1/4        |\n|`U+0D74`   | Number           | NUMBER            |                            | &#x0D74; Fraction 1/2        |\n|`U+0D75`   | Number           | NUMBER            |                            | &#x0D75; Fraction 3/4        |\n|`U+0D76`   | Number           | NUMBER            |                            | &#x0D76; Fraction 1/16       |\n|`U+0D77`   | Number           | NUMBER            |                            | &#x0D77; Fraction 1/8        |\n|`U+0D78`   | Number           | NUMBER            | _null_                     | &#x0D78; Fraction 3/16       |\n|`U+0D79`   | Symbol           | SYMBOL            | _null_                     | &#x0D79; Date Mark           |\n|`U+0D7A`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7A; Chillu Nn           |\n|`U+0D7B`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7B; Chillu N            |\n|`U+0D7C`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7C; Chillu Rr           |\n|`U+0D7D`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7D; Chillu L            |\n|`U+0D7E`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7E; Chillu Ll           |\n|`U+0D7F`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0D7F; Chillu K            |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Malayalam script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Malayalam text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Malayalam text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Malayalam text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-mongolian.md",
    "content": "# Mongolian character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Mongolian text](../opentype-shaping-mongolian.md).\n\n**Contents**\n\n  - [Mongolian character table](#mongolian-character-table)\n  - [Mongolian Supplement character table](#mongolian-supplement-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Mongolian character table ##\n\nMongolian glyphs should be classified as in the following\ntable. Codepoints in the Mongolian block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nThe _Joining type_ column indicates whether each codepoint is defined\nas joining with adjacent characters on the left side, right side, left\nand right sides (\"DUAL\"), or neither side (\"NON_JOINING\"). Codepoints\ndesignated TRANSPARENT in the _Joining type_ column do not join with\nadjacent characters and, in addition, do not affect the joining\nbehavior of surrounding characters. Non-spacing marks are of type\nTRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent\ncharacters to join.\n\nThe _Joining group_ column lists the fundamental letter that the\nlisted codepoint behaves like for joining purposes.\n\nAssigned codepoints with a _null_ in the _Joining group_\ncolumn evoke no special behavior from the shaping engine during the\njoin-computation stage.\n\nThe _Mark class_ column indicates the Canonical Combining Class\nfor the codepoint.  Marks are assigned non-zero combining classes so\nthat sequences of adjacent marks can be reordered as required by the\northography. \n\nFor Mongolian, a subset of marks in the 220 and 230 classes are also\ndesignated _Modifier Combining Marks_ (<abbr>MCM</abbr>). These are denoted with\n_220_MCM_ and _230_MCM_ in the _Mark class_ column. The <abbr title=\"Modifier Combining Mark\">MCM</abbr> marks are\ntreated differently during the mark-reordering stage.\n\n\n:::{table} Mongolian character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+1800`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1800; Mongolian Birga                      |\n|`U+1801`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1801; Mongolian Ellipsis                   |\n|`U+1802`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1802; Mongolian Comma                      |\n|`U+1803`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1803; Mongolian Full Stop                  |\n|`U+1804`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1804; Mongolian Colon                      |\n|`U+1805`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1805; Mongolian Four Dots                  |\n|`U+1806`   | Punctuation [Pd] | NON_JOINING  | _null_               | _0_        | &#x1806; Todo Soft Hyphen                     |\n|`U+1807`   | Punctuation      | DUAL         | _null_               | _0_        | &#x1807; Sibe Syllable Boundary Mark          |\n|`U+1808`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1808; Manchu Comma                         |\n|`U+1809`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1809; Manchu Full Stop                     |\n|`U+180A`   | Punctuation      | JOIN_CAUSING | _null_               | _0_        | &#x180A; Mongolian Nirugu                     |\n|`U+180B`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x180B; Free Variation Selector One          |\n|`U+180C`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x180C; Free Variation Selector Two          |\n|`U+180D`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x180D; Free Variation Selector Three        |\n|`U+180E`   | Formatting       | NON_JOINING  | _null_               | _0_        | &#x180E; Mongolian Vowel Separator            |\n|`U+180F`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x180f; Free Variation Selector Four         |\n| | | | | |                                                                                  \n|`U+1810`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1810; Digit Zero                           |\n|`U+1811`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1811; Digit One                            |\n|`U+1812`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1812; Digit Two                            |\n|`U+1813`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1813; Digit Three                          |\n|`U+1814`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1814; Digit Four                           |\n|`U+1815`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1815; Digit Five                           |\n|`U+1816`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1816; Digit Six                            |\n|`U+1817`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1817; Digit Seven                          |\n|`U+1818`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1818; Digit Eight                          |\n|`U+1819`   | Number           | NON_JOINING  | _null_               | _0_        | &#x1819; Digit Nine                           |\n|`U+181A`   | _unassigned_     |              |                      |            |                                               |\n|`U+181B`   | _unassigned_     |              |                      |            |                                               |\n|`U+181C`   | _unassigned_     |              |                      |            |                                               |\n|`U+181D`   | _unassigned_     |              |                      |            |                                               |\n|`U+181E`   | _unassigned_     |              |                      |            |                                               |\n|`U+181F`   | _unassigned_     |              |                      |            |                                               |\n| | | | | |                                                                                  \n|`U+1820`   | Letter           | DUAL         | _null_               | _0_        | &#x1820; A                                    |\n|`U+1821`   | Letter           | DUAL         | _null_               | _0_        | &#x1821; E                                    |\n|`U+1822`   | Letter           | DUAL         | _null_               | _0_        | &#x1822; I                                    |\n|`U+1823`   | Letter           | DUAL         | _null_               | _0_        | &#x1823; O                                    |\n|`U+1824`   | Letter           | DUAL         | _null_               | _0_        | &#x1824; U                                    |\n|`U+1825`   | Letter           | DUAL         | _null_               | _0_        | &#x1825; Oe                                   |\n|`U+1827`   | Letter           | DUAL         | _null_               | _0_        | &#x1826; Ue                                   |\n|`U+1827`   | Letter           | DUAL         | _null_               | _0_        | &#x1827; Ee                                   |\n|`U+1828`   | Letter           | DUAL         | _null_               | _0_        | &#x1828; Na                                   |\n|`U+1829`   | Letter           | DUAL         | _null_               | _0_        | &#x1829; Ang                                  |\n|`U+182A`   | Letter           | DUAL         | _null_               | _0_        | &#x182A; Ba                                   |\n|`U+182B`   | Letter           | DUAL         | _null_               | _0_        | &#x182B; Pa                                   |\n|`U+182C`   | Letter           | DUAL         | _null_               | _0_        | &#x182C; Qa                                   |\n|`U+182D`   | Letter           | DUAL         | _null_               | _0_        | &#x182D; Ga                                   |\n|`U+182E`   | Letter           | DUAL         | _null_               | _0_        | &#x182E; Ma                                   |\n|`U+182F`   | Letter           | DUAL         | _null_               | _0_        | &#x182F; La                                   |\n| | | | | |                                                                                  \n|`U+1830`   | Letter           | DUAL         | _null_               | _0_        | &#x1830; Sa                                   |\n|`U+1831`   | Letter           | DUAL         | _null_               | _0_        | &#x1831; Sha                                  |\n|`U+1832`   | Letter           | DUAL         | _null_               | _0_        | &#x1832; Ta                                   |\n|`U+1833`   | Letter           | DUAL         | _null_               | _0_        | &#x1833; Da                                   |\n|`U+1834`   | Letter           | DUAL         | _null_               | _0_        | &#x1834; Cha                                  |\n|`U+1835`   | Letter           | DUAL         | _null_               | _0_        | &#x1835; Ja                                   |\n|`U+1836`   | Letter           | DUAL         | _null_               | _0_        | &#x1836; Ya                                   |\n|`U+1837`   | Letter           | DUAL         | _null_               | _0_        | &#x1837; Ra                                   |\n|`U+1838`   | Letter           | DUAL         | _null_               | _0_        | &#x1838; Wa                                   |\n|`U+1839`   | Letter           | DUAL         | _null_               | _0_        | &#x1839; Fa                                   |\n|`U+183A`   | Letter           | DUAL         | _null_               | _0_        | &#x183A; Ka                                   |\n|`U+183B`   | Letter           | DUAL         | _null_               | _0_        | &#x183B; Kha                                  |\n|`U+183C`   | Letter           | DUAL         | _null_               | _0_        | &#x183C; Tsa                                  |\n|`U+183D`   | Letter           | DUAL         | _null_               | _0_        | &#x183D; Za                                   |\n|`U+183E`   | Letter           | DUAL         | _null_               | _0_        | &#x183E; Haa                                  |\n|`U+183F`   | Letter           | DUAL         | _null_               | _0_        | &#x183F; Zra                                  |\n| | | | | |                                                                                  \n|`U+1840`   | Letter           | DUAL         | _null_               | _0_        | &#x1840; Lha                                  |\n|`U+1841`   | Letter           | DUAL         | _null_               | _0_        | &#x1841; Zhi                                  |\n|`U+1842`   | Letter           | DUAL         | _null_               | _0_        | &#x1842; Chi                                  |\n|`U+1843`   | Letter           | DUAL         | _null_               | _0_        | &#x1843; Todo Long Vowel Sign                 |\n|`U+1844`   | Letter           | DUAL         | _null_               | _0_        | &#x1844; Todo E                               |\n|`U+1845`   | Letter           | DUAL         | _null_               | _0_        | &#x1845; Todo I                               |\n|`U+1846`   | Letter           | DUAL         | _null_               | _0_        | &#x1846; Todo O                               |\n|`U+1847`   | Letter           | DUAL         | _null_               | _0_        | &#x1847; Todo U                               |\n|`U+1848`   | Letter           | DUAL         | _null_               | _0_        | &#x1848; Todo Oe                              |\n|`U+1849`   | Letter           | DUAL         | _null_               | _0_        | &#x1849; Todo Ue                              |\n|`U+184A`   | Letter           | DUAL         | _null_               | _0_        | &#x184A; Todo Ang                             |\n|`U+184B`   | Letter           | DUAL         | _null_               | _0_        | &#x184B; Todo Ba                              |\n|`U+184C`   | Letter           | DUAL         | _null_               | _0_        | &#x184C; Todo Pa                              |\n|`U+184D`   | Letter           | DUAL         | _null_               | _0_        | &#x184D; Todo Qa                              |\n|`U+184E`   | Letter           | DUAL         | _null_               | _0_        | &#x184E; Todo Ga                              |\n|`U+184F`   | Letter           | DUAL         | _null_               | _0_        | &#x184F; Todo Ma                              |\n| | | | | |                                                                                      \n|`U+1850`   | Letter           | DUAL         | _null_               | _0_        | &#x1850; Todo Ta                              |\n|`U+1851`   | Letter           | DUAL         | _null_               | _0_        | &#x1851; Todo Da                              |\n|`U+1852`   | Letter           | DUAL         | _null_               | _0_        | &#x1852; Todo Cha                             |\n|`U+1853`   | Letter           | DUAL         | _null_               | _0_        | &#x1853; Todo Ja                              |\n|`U+1854`   | Letter           | DUAL         | _null_               | _0_        | &#x1854; Todo Tsa                             |\n|`U+1855`   | Letter           | DUAL         | _null_               | _0_        | &#x1855; Todo Ya                              |\n|`U+1856`   | Letter           | DUAL         | _null_               | _0_        | &#x1856; Todo Wa                              |\n|`U+1857`   | Letter           | DUAL         | _null_               | _0_        | &#x1857; Todo Ka                              |\n|`U+1858`   | Letter           | DUAL         | _null_               | _0_        | &#x1858; Todo Gaa                             |\n|`U+1859`   | Letter           | DUAL         | _null_               | _0_        | &#x1859; Todo Haa                             |\n|`U+185A`   | Letter           | DUAL         | _null_               | _0_        | &#x185A; Todo Jia                             |\n|`U+185B`   | Letter           | DUAL         | _null_               | _0_        | &#x185B; Todo Nia                             |\n|`U+185C`   | Letter           | DUAL         | _null_               | _0_        | &#x185C; Todo Dza                             |\n|`U+185D`   | Letter           | DUAL         | _null_               | _0_        | &#x185D; Sibe E                               |\n|`U+185E`   | Letter           | DUAL         | _null_               | _0_        | &#x185E; Sibe I                               |\n|`U+185F`   | Letter           | DUAL         | _null_               | _0_        | &#x185F; Sibe Iy                              |\n| | | | | |                                                                                     \n|`U+1860`   | Letter           | DUAL         | _null_               | _0_        | &#x1860; Sibe Ue                              |\n|`U+1861`   | Letter           | DUAL         | _null_               | _0_        | &#x1861; Sibe U                               |\n|`U+1862`   | Letter           | DUAL         | _null_               | _0_        | &#x1862; Sibe Ang                             |\n|`U+1863`   | Letter           | DUAL         | _null_               | _0_        | &#x1863; Sibe Ka                              |\n|`U+1864`   | Letter           | DUAL         | _null_               | _0_        | &#x1864; Sibe Ga                              |\n|`U+1865`   | Letter           | DUAL         | _null_               | _0_        | &#x1865; Sibe Ha                              |\n|`U+1866`   | Letter           | DUAL         | _null_               | _0_        | &#x1866; Sibe Pa                              |\n|`U+1867`   | Letter           | DUAL         | _null_               | _0_        | &#x1867; Sibe Sha                             |\n|`U+1868`   | Letter           | DUAL         | _null_               | _0_        | &#x1868; Sibe Ta                              |\n|`U+1869`   | Letter           | DUAL         | _null_               | _0_        | &#x1869; Sibe Da                              |\n|`U+186A`   | Letter           | DUAL         | _null_               | _0_        | &#x186A; Sibe Ja                              |\n|`U+186B`   | Letter           | DUAL         | _null_               | _0_        | &#x186B; Sibe Fa                              |\n|`U+186C`   | Letter           | DUAL         | _null_               | _0_        | &#x186C; Sibe Gaa                             |\n|`U+186D`   | Letter           | DUAL         | _null_               | _0_        | &#x186D; Sibe Haa                             |\n|`U+186E`   | Letter           | DUAL         | _null_               | _0_        | &#x186E; Sibe Tsa                             |\n|`U+186F`   | Letter           | DUAL         | _null_               | _0_        | &#x186F; Sibe Za                              |\n| | | | | |                                                                                      \n|`U+1870`   | Letter           | DUAL         | _null_               | _0_        | &#x1870; Sibe Raa                             |\n|`U+1871`   | Letter           | DUAL         | _null_               | _0_        | &#x1871; Sibe Cha                             |\n|`U+1872`   | Letter           | DUAL         | _null_               | _0_        | &#x1872; Sibe Zha                             |\n|`U+1873`   | Letter           | DUAL         | _null_               | _0_        | &#x1873; Manchu I                             |\n|`U+1874`   | Letter           | DUAL         | _null_               | _0_        | &#x1874; Manchu Ka                            |\n|`U+1875`   | Letter           | DUAL         | _null_               | _0_        | &#x1875; Manchu Ra                            |\n|`U+1876`   | Letter           | DUAL         | _null_               | _0_        | &#x1876; Manchu Fa                            |\n|`U+1877`   | Letter           | DUAL         | _null_               | _0_        | &#x1877; Manchu Zha                           |\n|`U+1878`   | Letter           | DUAL         | _null_               | _0_        | &#x1878; Cha With Two Dots                    |\n|`U+1879`   | _unassigned_     |              |                      |            |                                               |\n|`U+187A`   | _unassigned_     |              |                      |            |                                               |\n|`U+187B`   | _unassigned_     |              |                      |            |                                               |\n|`U+187C`   | _unassigned_     |              |                      |            |                                               |\n|`U+187D`   | _unassigned_     |              |                      |            |                                               |\n|`U+187E`   | _unassigned_     |              |                      |            |                                               |\n|`U+187F`   | _unassigned_     |              |                      |            |                                               |\n| | | | | |                                                                                  \n|`U+1880`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x1880; Ali Gali Anusvara One                |\n|`U+1881`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x1881; Ali Gali Visarga One                 |\n|`U+1882`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x1882; Ali Gali Damaru                      |\n|`U+1883`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x1883; Ali Gali Ubadama                     |\n|`U+1884`   | Letter           | NON_JOINING  | _null_               | _0_        | &#x1884; Ali Gali Inverted Ubadama            |\n|`U+1885`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x1885; Ali Gali Baluda                      |\n|`U+1886`   | Mark [Mn]        | TRANSPARENT  | _null_               | _0_        | &#x1886; Ali Gali Three Baluda                |\n|`U+1887`   | Letter           | DUAL         | _null_               | _0_        | &#x1887; Ali Gali A                           |\n|`U+1888`   | Letter           | DUAL         | _null_               | _0_        | &#x1888; Ali Gali I                           |\n|`U+1889`   | Letter           | DUAL         | _null_               | _0_        | &#x1889; Ali Gali Ka                          |\n|`U+188A`   | Letter           | DUAL         | _null_               | _0_        | &#x188A; Ali Gali Nga                         |\n|`U+188B`   | Letter           | DUAL         | _null_               | _0_        | &#x188B; Ali Gali Ca                          |\n|`U+188C`   | Letter           | DUAL         | _null_               | _0_        | &#x188C; Ali Gali Tta                         |\n|`U+188D`   | Letter           | DUAL         | _null_               | _0_        | &#x188D; Ali Gali Ttha                        |\n|`U+188E`   | Letter           | DUAL         | _null_               | _0_        | &#x188E; Ali Gali Dda                         |\n|`U+188F`   | Letter           | DUAL         | _null_               | _0_        | &#x188F; Ali Gali Nna                         |\n| | | | | |                                                                                          \n|`U+1890`   | Letter           | DUAL         | _null_               | _0_        | &#x1890; Ali Gali Ta                          |\n|`U+1891`   | Letter           | DUAL         | _null_               | _0_        | &#x1891; Ali Gali Da                          |\n|`U+1892`   | Letter           | DUAL         | _null_               | _0_        | &#x1892; Ali Gali Pa                          |\n|`U+1893`   | Letter           | DUAL         | _null_               | _0_        | &#x1893; Ali Gali Pha                         |\n|`U+1894`   | Letter           | DUAL         | _null_               | _0_        | &#x1894; Ali Gali Ssa                         |\n|`U+1895`   | Letter           | DUAL         | _null_               | _0_        | &#x1895; Ali Gali Zha                         |\n|`U+1896`   | Letter           | DUAL         | _null_               | _0_        | &#x1896; Ali Gali Za                          |\n|`U+1897`   | Letter           | DUAL         | _null_               | _0_        | &#x1897; Ali Gali Ah                          |\n|`U+1898`   | Letter           | DUAL         | _null_               | _0_        | &#x1898; Todo Ali Gali Ta                     |\n|`U+1899`   | Letter           | DUAL         | _null_               | _0_        | &#x1899; Todo Ali Gali Zha                    |\n|`U+189A`   | Letter           | DUAL         | _null_               | _0_        | &#x189A; Manchu Ali Gali Gha                  |\n|`U+189B`   | Letter           | DUAL         | _null_               | _0_        | &#x189B; Manchu Ali Gali Nga                  |\n|`U+189C`   | Letter           | DUAL         | _null_               | _0_        | &#x189C; Manchu Ali Gali Ca                   |\n|`U+189D`   | Letter           | DUAL         | _null_               | _0_        | &#x189D; Manchu Ali Gali Jha                  |\n|`U+189E`   | Letter           | DUAL         | _null_               | _0_        | &#x189E; Manchu Ali Gali Tta                  |\n|`U+189F`   | Letter           | DUAL         | _null_               | _0_        | &#x189F; Manchu Ali Gali Ddha                 |\n| | | | | |                                                                                                  \n|`U+18A0`   | Letter           | DUAL         | _null_               | _0_        | &#x18A0; Manchu Ali Gali Ta                   |\n|`U+18A1`   | Letter           | DUAL         | _null_               | _0_        | &#x18A1; Manchu Ali Gali Dha                  |\n|`U+18A2`   | Letter           | DUAL         | _null_               | _0_        | &#x18A2; Manchu Ali Gali Ssa                  |\n|`U+18A3`   | Letter           | DUAL         | _null_               | _0_        | &#x18A3; Manchu Ali Gali Cya                  |\n|`U+18A4`   | Letter           | DUAL         | _null_               | _0_        | &#x18A4; Manchu Ali Gali Zha                  |\n|`U+18A5`   | Letter           | DUAL         | _null_               | _0_        | &#x18A5; Manchu Ali Gali Za                   |\n|`U+18A6`   | Letter           | DUAL         | _null_               | _0_        | &#x18A6; Ali Gali Half U                      |\n|`U+18A7`   | Letter           | DUAL         | _null_               | _0_        | &#x18A7; Ali Gali Half Ya                     |\n|`U+18A8`   | Letter           | DUAL         | _null_               | _0_        | &#x18A8; Manchu Ali Gali Bha                  |\n|`U+18A9`   | Mark [Mn]        | TRANSPARENT  | _null_               | 228        | &#x18A9; Ali Gali Dagalga                     |\n|`U+18AA`   | Letter           | DUAL         | _null_               | _0_        | &#x18AA; Manchu Ali Gali Lha                  |\n|`U+18AB`   | _unassigned_     |              |                      |            |                                               |\n|`U+18AC`   | _unassigned_     |              |                      |            |                                               |\n|`U+18AD`   | _unassigned_     |              |                      |            |                                               |\n|`U+18AE`   | _unassigned_     |              |                      |            |                                               |\n|`U+18AF`   | _unassigned_     |              |                      |            |                                               |\n:::\n\n\n\n## Mongolian Supplement character table ##\n\nThe Mongolian Supplement block includes variants of the _birga_ mark\nused to denote the beginning of a text.\n\n:::{table} Mongolian Supplement character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+11660`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11660; Birga with Ornament                 |\n|`U+11661`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11661; Rotated Birga                       |\n|`U+11662`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11662; Double Birga with Ornament          |\n|`U+11663`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11663; Triple Birga with Ornament          |\n|`U+11664`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11664; Birga with Double Ornament          |\n|`U+11665`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11665; Rotated Birga with Ornament         |\n|`U+11666`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11666; Rotated Birga with Double Ornament  |\n|`U+11667`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11667; Inverted Birga                      |\n|`U+11668`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11668; Inverted Birga with Double Ornament |\n|`U+11669`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x11669; Swirl Birga                         |\n|`U+1166A`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1166A; Swirl Birga with Ornament           |\n|`U+1166B`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1166B; Swirl Birga with Double Ornament    |\n|`U+1166C`  | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x1166C; Turned Swirl Birga with Double Ornament|\n|`U+1166D`  | _unassigned_     |              |                      |            |                                               |\n|`U+1166E`  | _unassigned_     |              |                      |            |                                               |\n|`U+1166F`  | _unassigned_     |              |                      |            |                                               |\n| | | | | |\n|`U+11670`  | _unassigned_     |              |                      |            |                                               |\n|`U+11671`  | _unassigned_     |              |                      |            |                                               |\n|`U+11672`  | _unassigned_     |              |                      |            |                                               |\n|`U+11673`  | _unassigned_     |              |                      |            |                                               |\n|`U+11674`  | _unassigned_     |              |                      |            |                                               |\n|`U+11675`  | _unassigned_     |              |                      |            |                                               |\n|`U+11676`  | _unassigned_     |              |                      |            |                                               |\n|`U+11677`  | _unassigned_     |              |                      |            |                                               |\n|`U+11678`  | _unassigned_     |              |                      |            |                                               |\n|`U+11679`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167A`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167B`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167C`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167D`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167E`  | _unassigned_     |              |                      |            |                                               |\n|`U+1167F`  | _unassigned_     |              |                      |            |                                               |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Mongolian text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text syllables may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+00A0`   | Separator        | NON_JOINING  | _null_               | _0_        | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOIN_CAUSING | _null_               | _0_        | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2014; Em dash               |\n|`U+202F`   | Separator        | NON_JOINING  | _null_               | _0_        | &#x202F; Narrow No-Break Space |\n|`U+25CC`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x25CC; Dotted circle         |\n| | | | | | |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence \"_Letter_,ZWJ\"\nwould be used. To show the medial form of a letter in isolation, the\nsequence \"ZWJ,_Letter_,ZWJ\" would be used.\n\n\n<!--- Zero-Width Non Joiner explanation --->\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and \"Hamza\") in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\nThe narrow no-break space is used in Mongolian to insert a small gap\nbetween a word and its suffix. \n"
  },
  {
    "path": "character-tables/character-tables-myanmar.md",
    "content": "# Myanmar character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Myanmar text](../opentype-shaping-myanmar.md).\n\n**Contents**\n\n  - [Myanmar character table](#myanmar-character-table)\n  - [Myanmar Extended character tables](#myanmar-extended-character-tables)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Myanmar character table ##\n\nMyanmar glyphs should be classified as in the following\ntable. Codepoints in the Myanmar block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Myanmar character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1000`   | Letter           | CONSONANT         | _null_                     | &#x1000; Ka                  |\n|`U+1001`   | Letter           | CONSONANT         | _null_                     | &#x1001; Kha                 |\n|`U+1002`   | Letter           | CONSONANT         | _null_                     | &#x1002; Ga                  |\n|`U+1003`   | Letter           | CONSONANT         | _null_                     | &#x1003; Gha                 |\n|`U+1004`   | Letter           | CONSONANT         | _null_                     | &#x1004; Nga                 |\n|`U+1005`   | Letter           | CONSONANT         | _null_                     | &#x1005; Ca                  |\n|`U+1006`   | Letter           | CONSONANT         | _null_                     | &#x1006; Cha                 |\n|`U+1007`   | Letter           | CONSONANT         | _null_                     | &#x1007; Ja                  |\n|`U+1008`   | Letter           | CONSONANT         | _null_                     | &#x1008; Jha                 |\n|`U+1009`   | Letter           | CONSONANT         | _null_                     | &#x1009; Nya                 |\n|`U+100A`   | Letter           | CONSONANT         | _null_                     | &#x100A; Nnya                |\n|`U+100B`   | Letter           | CONSONANT         | _null_                     | &#x100B; Tta                 |\n|`U+100C`   | Letter           | CONSONANT         | _null_                     | &#x100C; Ttha                |\n|`U+100D`   | Letter           | CONSONANT         | _null_                     | &#x100D; Dda                 |\n|`U+100E`   | Letter           | CONSONANT         | _null_                     | &#x100E; DDha                |\n|`U+100F`   | Letter           | CONSONANT         | _null_                     | &#x100F; Nna                 |\n| | | | |\n|`U+1010`   | Letter           | CONSONANT         | _null_                     | &#x1010; Ta                  |\n|`U+1011`   | Letter           | CONSONANT         | _null_                     | &#x1011; Tha                 |\n|`U+1012`   | Letter           | CONSONANT         | _null_                     | &#x1012; Da                  |\n|`U+1013`   | Letter           | CONSONANT         | _null_                     | &#x1013; Dha                 |\n|`U+1014`   | Letter           | CONSONANT         | _null_                     | &#x1014; Na                  |\n|`U+1015`   | Letter           | CONSONANT         | _null_                     | &#x1015; Pa                  |\n|`U+1016`   | Letter           | CONSONANT         | _null_                     | &#x1016; Pha                 |\n|`U+1017`   | Letter           | CONSONANT         | _null_                     | &#x1017; Ba                  |\n|`U+1018`   | Letter           | CONSONANT         | _null_                     | &#x1018; Bha                 |\n|`U+1019`   | Letter           | CONSONANT         | _null_                     | &#x1019; Ma                  |\n|`U+101A`   | Letter           | CONSONANT         | _null_                     | &#x101A; Ya                  |\n|`U+101B`   | Letter           | CONSONANT         | _null_                     | &#x101B; Ra                  |\n|`U+101C`   | Letter           | CONSONANT         | _null_                     | &#x101C; La                  |\n|`U+101D`   | Letter           | CONSONANT         | _null_                     | &#x101D; Wa                  |\n|`U+101E`   | Letter           | CONSONANT         | _null_                     | &#x101E; Sa                  |\n|`U+101F`   | Letter           | CONSONANT         | _null_                     | &#x101F; Ha                  |\n| | | | |\n|`U+1020`   | Letter           | CONSONANT         | _null_                     | &#x1020; Lla                 |\n|`U+1021`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1021; A                   |\n|`U+1022`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1022; Shan A              |\n|`U+1023`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1023; I                   |\n|`U+1024`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1024; Ii                  |\n|`U+1025`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1025; U                   |\n|`U+1026`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1026; Uu                  |\n|`U+1027`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1027; E                   |\n|`U+1028`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1028; Mon E               |\n|`U+1029`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1029; O                   |\n|`U+102A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x102A; Au                  |\n|`U+102B`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x102B; Sign Tall Aa        |\n|`U+102C`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x102C; Sign Aa             |\n|`U+102D`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x102D; Sign I              |\n|`U+102E`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x102E; Sign Ii             |\n|`U+102F`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x102F; Sign U              |\n| | | | |\n|`U+1030`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x1030; Sign Uu             |\n|`U+1031`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x1031; Sign E              |\n|`U+1032`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1032; Sign Ai             |\n|`U+1033`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1033; Sign Mon Ii         |\n|`U+1034`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1034; Sign Mon O          |\n|`U+1035`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1035; Sign E Above        |\n|`U+1036`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x1036; Anusvara            |\n|`U+1037`   | Mark [Mn]        | TONE_MARKER       | BOTTOM_POSITION            | &#x1037; Dot Below           |\n|`U+1038`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x1038; Visarga             |\n|`U+1039`   | Mark [Mn]        | INVISIBLE_STACKER | _null_                     | &#x1039; Virama              |\n|`U+103A`   | Mark [Mn]        | PURE_KILLER       | TOP_POSITION               | &#x103A; Asat                |\n|`U+103B`   | Mark [Mc]        | CONSONANT_MEDIAL  | RIGHT_POSITION             | &#x103B; Sign Medial Ya      |\n|`U+103C`   | Mark [Mc]        | CONSONANT_MEDIAL  | TOP_LEFT_AND_BOTTOM_POSITION | &#x103C; Sign Medial Ra      |\n|`U+103D`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x103D; Sign Medial Wa      |\n|`U+103E`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x103E; Sign Medial Ha      |\n|`U+103F`   | Letter           | CONSONANT         | _null_                     | &#x103F; Great Sa            |\n| | | | |\n|`U+1040`   | Number           | NUMBER            | _null_                     | &#x1040; Digit Zero          |\n|`U+1041`   | Number           | NUMBER            | _null_                     | &#x1041; Digit One           |\n|`U+1042`   | Number           | NUMBER            | _null_                     | &#x1042; Digit Two           |\n|`U+1043`   | Number           | NUMBER            | _null_                     | &#x1043; Digit Three         |\n|`U+1044`   | Number           | NUMBER            | _null_                     | &#x1044; Digit Four          |\n|`U+1045`   | Number           | NUMBER            | _null_                     | &#x1045; Digit Five          |\n|`U+1046`   | Number           | NUMBER            | _null_                     | &#x1046; Digit Six           |\n|`U+1047`   | Number           | NUMBER            | _null_                     | &#x1047; Digit Seven         |\n|`U+1048`   | Number           | NUMBER            | _null_                     | &#x1048; Digit Eight         |\n|`U+1049`   | Number           | NUMBER            | _null_                     | &#x1049; Digit Nine          |\n|`U+104A`   | Punctuation      | _null_            | _null_                     | &#x104A; Little Section      |\n|`U+104B`   | Punctuation      | _null_            | _null_                     | &#x104B; Section             |\n|`U+104C`   | Punctuation      | _null_            | _null_                     | &#x104C; Locative            |\n|`U+104D`   | Punctuation      | _null_            | _null_                     | &#x104D; Completed           |\n|`U+104E`   | Punctuation      | CONSONANT_PLACEHOLDER| _null_                  | &#x104E; Aforementioned      |\n|`U+104F`   | Punctuation      | _null_            | _null_                     | &#x104F; Genitive            |\n| | | | |\n|`U+1050`   | Letter           | CONSONANT         | _null_                     | &#x1050; Sha                 |\n|`U+1051`   | Letter           | CONSONANT         | _null_                     | &#x1051; Ssa                 |\n|`U+1052`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1052; Vocalic R           |\n|`U+1053`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1053; Vocalic Rr          |\n|`U+1054`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1054; Vocalic L           |\n|`U+1055`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x1055; Vocalic Ll          |\n|`U+1056`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1056; Sign Vocalic R      |\n|`U+1057`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1057; Sign Vocalic Rr     |\n|`U+1058`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x1058; Sign Vocalic L      |\n|`U+1059`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x1059; Sign Vocalic Ll     |\n|`U+105A`   | Letter           | CONSONANT         | _null_                     | &#x105A; Mon Nga             |\n|`U+105B`   | Letter           | CONSONANT         | _null_                     | &#x105B; Mon Jha             |\n|`U+105C`   | Letter           | CONSONANT         | _null_                     | &#x105C; Mon Bba             |\n|`U+105D`   | Letter           | CONSONANT         | _null_                     | &#x105D; Mon Bbe             |\n|`U+105E`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x105E; Sign Mon Medial Na  |\n|`U+105F`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x105F; Sign Mon Medial Ma  |\n| | | | |\n|`U+1060`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x1060; Sign Mon Medial La  |\n|`U+1061`   | Letter           | CONSONANT         | _null_                     | &#x1061; Sgaw Karen Sha      |\n|`U+1062`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1062; Sign Sgaw Karen Eu  |\n|`U+1063`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1063; Tone Sgaw Karen Hathi|\n|`U+1064`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1064; Tone Sgaw Karen Ke Pho|\n|`U+1065`   | Letter           | CONSONANT         | _null_                     | &#x1065; Western Pwo Karen Tha|\n|`U+1066`   | Letter           | CONSONANT         | _null_                     | &#x1066; Western Pwo Karen Pwa|\n|`U+1067`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1067; Sign Western Pwo Karen Eu|\n|`U+1068`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1068; Sign Western Pwo Karen Ue|\n|`U+1069`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1069; Sign Western Pwo Karen Tone 1|\n|`U+106A`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x106A; Sign Western Pwo Karen Tone 2|\n|`U+106B`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x106B; Sign Western Pwo Karen Tone 3|\n|`U+106C`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x106C; Sign Western Pwo Karen Tone 4|\n|`U+106D`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x106D; Sign Western Pwo Karen Tone 5|\n|`U+106E`   | Letter           | CONSONANT         | _null_                     | &#x106E; Eastern Pwo Karen Nna|\n|`U+106F`   | Letter           | CONSONANT         | _null_                     | &#x106F; Eastern Pwo Karen Ywa|\n| | | | |\n|`U+1070`   | Letter           | CONSONANT         | _null_                     | &#x1070; Eastern Pwo Karen Ghwa|\n|`U+1071`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1071; Sign Geba Karen I   |\n|`U+1072`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1072; Sign Kayah Oe       |\n|`U+1073`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1073; Sign Kayah U        |\n|`U+1074`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1074; Sign Kayah Ee       |\n|`U+1075`   | Letter           | CONSONANT         | _null_                     | &#x1075; Shan Ka             |\n|`U+1076`   | Letter           | CONSONANT         | _null_                     | &#x1076; Shan Kha            |\n|`U+1077`   | Letter           | CONSONANT         | _null_                     | &#x1077; Shan Ga             |\n|`U+1078`   | Letter           | CONSONANT         | _null_                     | &#x1078; Shan Ca             |\n|`U+1079`   | Letter           | CONSONANT         | _null_                     | &#x1079; Shan Za             |\n|`U+107A`   | Letter           | CONSONANT         | _null_                     | &#x107A; Shan Nya            |\n|`U+107B`   | Letter           | CONSONANT         | _null_                     | &#x107B; Shan Da             |\n|`U+107C`   | Letter           | CONSONANT         | _null_                     | &#x107C; Shan Na             |\n|`U+107D`   | Letter           | CONSONANT         | _null_                     | &#x107D; Shan Pha            |\n|`U+107E`   | Letter           | CONSONANT         | _null_                     | &#x107E; Shan Fa             |\n|`U+107F`   | Letter           | CONSONANT         | _null_                     | &#x107F; Shan Ba             |\n| | | | |\n|`U+1080`   | Letter           | CONSONANT         | _null_                     | &#x1080; Shan Tha            |\n|`U+1081`   | Letter           | CONSONANT         | _null_                     | &#x1081; Shan Ha             |\n|`U+1082`   | Mark [Mn]        | CONSONANT_MEDIAL  | BOTTOM_POSITION            | &#x1082; Sign Shan Medial Wa |\n|`U+1083`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x1083; Sign Shan Aa        |\n|`U+1084`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x1084; Sign Shan E         |\n|`U+1085`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1085; Sign Shan E Above   |\n|`U+1086`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x1086; Sign Shan Final Y   |\n|`U+1087`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1087; Sign Shan Tone 2    |\n|`U+1088`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1088; Sign Shan Tone 3    |\n|`U+1089`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x1089; Sign Shan Tone 5    |\n|`U+108A`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x108A; Sign Shan Tone 6    |\n|`U+108B`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x108B; Sign Shan Council Tone 2|\n|`U+108C`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x108C; Sign Shan Council Tone 3|\n|`U+108D`   | Mark [Mn]        | TONE_MARKER       | BOTTOM_POSITION            | &#x108D; Sign Shan Council Emphatic Tone|\n|`U+108E`   | Letter           | CONSONANT         | _null_                     | &#x108E; Rumai Palaung Fa    |\n|`U+108F`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x108F; Sign Rumai Palaung Tone 5|\n| | | | |\n|`U+1090`   | Number           | NUMBER            | _null_                     | &#x1090; Shan Digit Zero     |\n|`U+1091`   | Number           | NUMBER            | _null_                     | &#x1091; Shan Digit One      |\n|`U+1092`   | Number           | NUMBER            | _null_                     | &#x1092; Shan Digit Two      |\n|`U+1093`   | Number           | NUMBER            | _null_                     | &#x1093; Shan Digit Three    |\n|`U+1094`   | Number           | NUMBER            | _null_                     | &#x1094; Shan Digit Four     |\n|`U+1095`   | Number           | NUMBER            | _null_                     | &#x1095; Shan Digit Five     |\n|`U+1096`   | Number           | NUMBER            | _null_                     | &#x1096; Shan Digit Six      |\n|`U+1097`   | Number           | NUMBER            | _null_                     | &#x1097; Shan Digit Seven    |\n|`U+1098`   | Number           | NUMBER            | _null_                     | &#x1098; Shan Digit Eight    |\n|`U+1099`   | Number           | NUMBER            | _null_                     | &#x1099; Shan Digit Nine     |\n|`U+109A`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x109A; Sign Khamti Tone 1  |\n|`U+109B`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#x109B; Sign Khamti Tone 3  |\n|`U+109C`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x109C; Sign Aiton A        |\n|`U+109D`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x109D; Sign Aiton Ai       |\n|`U+109E`   | Symbol           | SYMBOL            | _null_                     | &#x109E; Shan One            |\n|`U+109F`   | Symbol           | SYMBOL            | _null_                     | &#x109F; Shan Exclamation    |\n:::\n\n\n## Myanmar Extended character tables ##\n\n### Myanmar Extended A character table ###\n\n\n:::{table} Myanmar Extended-A character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+AA60`   | Letter           | CONSONANT         | _null_                     | &#xAA60; Khamti Ga           |\n|`U+AA61`   | Letter           | CONSONANT         | _null_                     | &#xAA61; Khamti Ca           |\n|`U+AA62`   | Letter           | CONSONANT         | _null_                     | &#xAA62; Khamti Cha          |\n|`U+AA63`   | Letter           | CONSONANT         | _null_                     | &#xAA63; Khamti Ja           |\n|`U+AA64`   | Letter           | CONSONANT         | _null_                     | &#xAA64; Khamti Jha          |\n|`U+AA65`   | Letter           | CONSONANT         | _null_                     | &#xAA65; Khamti Nya          |\n|`U+AA66`   | Letter           | CONSONANT         | _null_                     | &#xAA66; Khamti Tta          |\n|`U+AA67`   | Letter           | CONSONANT         | _null_                     | &#xAA67; Khamti Ttha         |\n|`U+AA68`   | Letter           | CONSONANT         | _null_                     | &#xAA68; Khamti Dda          |\n|`U+AA69`   | Letter           | CONSONANT         | _null_                     | &#xAA69; Khamti Ddha         |\n|`U+AA6A`   | Letter           | CONSONANT         | _null_                     | &#xAA6A; Khamti Dha          |\n|`U+AA6B`   | Letter           | CONSONANT         | _null_                     | &#xAA6B; Khamti Na           |\n|`U+AA6C`   | Letter           | CONSONANT         | _null_                     | &#xAA6C; Khamti Sa           |\n|`U+AA6D`   | Letter           | CONSONANT         | _null_                     | &#xAA6D; Khamti Ha           |\n|`U+AA6E`   | Letter           | CONSONANT         | _null_                     | &#xAA6E; Khamti Hha          |\n|`U+AA6F`   | Letter           | CONSONANT         | _null_                     | &#xAA6F; Khamti Fa           |\n| | | | |\n|`U+AA70`   | Letter           | _null_            | _null_                     | &#xAA70; Khamti Reduplication|\n|`U+AA71`   | Letter           | CONSONANT         | _null_                     | &#xAA71; Khamti Xa           |\n|`U+AA72`   | Letter           | CONSONANT         | _null_                     | &#xAA72; Khamti Za           |\n|`U+AA73`   | Letter           | CONSONANT         | _null_                     | &#xAA73; Khamti Ra           |\n|`U+AA74`   | Letter           | CONSONANT_PLACEHOLDER| _null_                  | &#xAA74; Khamti Oay          |\n|`U+AA75`   | Letter           | CONSONANT_PLACEHOLDER| _null_                  | &#xAA75; Khamti Qn           |\n|`U+AA76`   | Letter           | CONSONANT_PLACEHOLDER| _null_                  | &#xAA76; Khamti Hm           |\n|`U+AA77`   | Symbol           | SYMBOL            | _null_                     | &#xAA77; Khamti Aiton Exclamation|\n|`U+AA78`   | Symbol           | SYMBOL            | _null_                     | &#xAA78; Khamti Aiton One    |\n|`U+AA79`   | Symbol           | SYMBOL            | _null_                     | &#xAA79; Khamti Aiton Two    |\n|`U+AA7A`   | Letter           | CONSONANT         | _null_                     | &#xAA7A; Khamti Aiton Ra     |\n|`U+AA7B`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#xAA7B; Sign Pao Karen Tone |\n|`U+AA7C`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#xAA7C; Sign Tai Laing Tone 2|\n|`U+AA7D`   | Mark [Mc]        | TONE_MARKER       | RIGHT_POSITION             | &#xAA7D; Sign Tai Laing Tone 5|\n|`U+AA7E`   | Letter           | CONSONANT         | _null_                     | &#xAA7E; Shwe Palaung Cha    |\n|`U+AA7F`   | Letter           | CONSONANT         | _null_                     | &#xAA7F; Shwe Palaung Sha    |\n:::\n\n\n### Myanmar Extended B character table ###\n\n\n:::{table} Myanmar Extended-B character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+A9E0`   | Letter           | CONSONANT         | _null_                     | &#xA9E0; Shan Gha            |\n|`U+A9E1`   | Letter           | CONSONANT         | _null_                     | &#xA9E1; Shan Cha            |\n|`U+A9E2`   | Letter           | CONSONANT         | _null_                     | &#xA9E2; Shan Jha            |\n|`U+A9E3`   | Letter           | CONSONANT         | _null_                     | &#xA9E3; Shan Nna            |\n|`U+A9E4`   | Letter           | CONSONANT         | _null_                     | &#xA9E4; Shan Bha            |\n|`U+A9E5`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#xA9E5; Sign Shan Saw       |\n|`U+A9E6`   | Letter           | _null_            | _null_                     | &#xA9E6; Shan Reduplication  |\n|`U+A9E7`   | Letter           | CONSONANT         | _null_                     | &#xA9E7; Tai Laing Nya       |\n|`U+A9E8`   | Letter           | CONSONANT         | _null_                     | &#xA9E8; Tai Laing Fa        |\n|`U+A9E9`   | Letter           | CONSONANT         | _null_                     | &#xA9E9; Tai Laing Ga        |\n|`U+A9EA`   | Letter           | CONSONANT         | _null_                     | &#xA9EA; Tai Laing Gha       |\n|`U+A9EB`   | Letter           | CONSONANT         | _null_                     | &#xA9EB; Tai Laing Ja        |\n|`U+A9EC`   | Letter           | CONSONANT         | _null_                     | &#xA9EC; Tai Laing Jha       |\n|`U+A9ED`   | Letter           | CONSONANT         | _null_                     | &#xA9ED; Tai Laing Dda       |\n|`U+A9EE`   | Letter           | CONSONANT         | _null_                     | &#xA9EE; Tai Laing Ddha      |\n|`U+A9EF`   | Letter           | CONSONANT         | _null_                     | &#xA9EF; Tai Laing Nna       |\n| | | | |\n|`U+A9F0`   | Number           | NUMBER            | _null_                     | &#xA9F0; Tai Laing Digit Zero|\n|`U+A9F1`   | Number           | NUMBER            | _null_                     | &#xA9F1; Tai Laing Digit One |\n|`U+A9F2`   | Number           | NUMBER            | _null_                     | &#xA9F2; Tai Laing Digit Two |\n|`U+A9F3`   | Number           | NUMBER            | _null_                     | &#xA9F3; Tai Laing Digit Three|\n|`U+A9F4`   | Number           | NUMBER            | _null_                     | &#xA9F4; Tai Laing Digit Four|\n|`U+A9F5`   | Number           | NUMBER            | _null_                     | &#xA9F5; Tai Laing Digit Five|\n|`U+A9F6`   | Number           | NUMBER            | _null_                     | &#xA9F6; Tai Laing Digit Six |\n|`U+A9F7`   | Number           | NUMBER            | _null_                     | &#xA9F7; Tai Laing Digit Seven|\n|`U+A9F8`   | Number           | NUMBER            | _null_                     | &#xA9F8; Tai Laing Digit Eight|\n|`U+A9F9`   | Number           | NUMBER            | _null_                     | &#xA9F9; Tai Laing Digit Nine|\n|`U+A9FA`   | Letter           | CONSONANT         | _null_                     | &#xA9FA; Tai Laing Lla       |\n|`U+A9FB`   | Letter           | CONSONANT         | _null_                     | &#xA9FB; Tai Laing Da        |\n|`U+A9FC`   | Letter           | CONSONANT         | _null_                     | &#xA9FC; Tai Laing Dha       |\n|`U+A9FD`   | Letter           | CONSONANT         | _null_                     | &#xA9FD; Tai Laing Ba        |\n|`U+A9FE`   | Letter           | CONSONANT         | _null_                     | &#xA9FE; Tai Laing Bha       |\n|`U+A9FF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n### Myanmar Extended C character table ###\n\n\n:::{table} Myanmar Extended-C character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+116D0`  | Number           | NUMBER            | _null_                     | &#x116D0; Pao Digit Zero     |\n|`U+116D1`  | Number           | NUMBER            | _null_                     | &#x116D1; Pao Digit One      |\n|`U+116D2`  | Number           | NUMBER            | _null_                     | &#x116D2; Pao Digit Two      |\n|`U+116D3`  | Number           | NUMBER            | _null_                     | &#x116D3; Pao Digit Three    |\n|`U+116D4`  | Number           | NUMBER            | _null_                     | &#x116D4; Pao Digit Four     |\n|`U+116D5`  | Number           | NUMBER            | _null_                     | &#x116D5; Pao Digit Five     |\n|`U+116D6`  | Number           | NUMBER            | _null_                     | &#x116D6; Pao Digit Six      |\n|`U+116D7`  | Number           | NUMBER            | _null_                     | &#x116D7; Pao Digit Seven    |\n|`U+116D8`  | Number           | NUMBER            | _null_                     | &#x116D8; Pao Digit Eight    |\n|`U+116D9`  | Number           | NUMBER            | _null_                     | &#x116D9; Pao Digit Nine     |\n|`U+116DA`  | Number           | NUMBER            | _null_                     | &#x116DA; Pao Digit Zero     |\n|`U+116DB`  | Number           | NUMBER            | _null_                     | &#x116DB; Eastern Pwo Karen Digit One|\n|`U+116DC`  | Number           | NUMBER            | _null_                     | &#x116DC; Eastern Pwo Karen Digit Two|\n|`U+116DD`  | Number           | NUMBER            | _null_                     | &#x116DD; Eastern Pwo Karen Digit Three|\n|`U+116DE`  | Number           | NUMBER            | _null_                     | &#x116DE; Eastern Pwo Karen Digit Four|\n|`U+116DF`  | Number           | NUMBER            | _null_                     | &#x116DF; Eastern Pwo Karen Digit Five|\n| | | | |\n|`U+116E0`  | Number           | NUMBER            | _null_                     | &#x116D0; Eastern Pwo Karen Digit Six|\n|`U+116E1`  | Number           | NUMBER            | _null_                     | &#x116D1; Eastern Pwo Karen Digit Seven|\n|`U+116E2`  | Number           | NUMBER            | _null_                     | &#x116D2; Eastern Pwo Karen Digit Eight|\n|`U+116E3`  | Number           | NUMBER            | _null_                     | &#x116D3; Eastern Pwo Karen Digit Nine|\n|`U+116E4`  | _unassigned_     |                   |                            |                              |\n|`U+116E5`  | _unassigned_     |                   |                            |                              |\n|`U+116E6`  | _unassigned_     |                   |                            |                              |\n|`U+116E7`  | _unassigned_     |                   |                            |                              |\n|`U+116E8`  | _unassigned_     |                   |                            |                              |\n|`U+116E9`  | _unassigned_     |                   |                            |                              |\n|`U+116EA`  | _unassigned_     |                   |                            |                              |\n|`U+116EB`  | _unassigned_     |                   |                            |                              |\n|`U+116EC`  | _unassigned_     |                   |                            |                              |\n|`U+116ED`  | _unassigned_     |                   |                            |                              |\n|`U+116EE`  | _unassigned_     |                   |                            |                              |\n|`U+116EF`  | _unassigned_     |                   |                            |                              |\n| | | | |\n|`U+116F0`  | _unassigned_     |                   |                            |                              |\n|`U+116F1`  | _unassigned_     |                   |                            |                              |\n|`U+116F2`  | _unassigned_     |                   |                            |                              |\n|`U+116F3`  | _unassigned_     |                   |                            |                              |\n|`U+116F4`  | _unassigned_     |                   |                            |                              |\n|`U+116F5`  | _unassigned_     |                   |                            |                              |\n|`U+116F6`  | _unassigned_     |                   |                            |                              |\n|`U+116F7`  | _unassigned_     |                   |                            |                              |\n|`U+116F8`  | _unassigned_     |                   |                            |                              |\n|`U+116F9`  | _unassigned_     |                   |                            |                              |\n|`U+116FA`  | _unassigned_     |                   |                            |                              |\n|`U+116FB`  | _unassigned_     |                   |                            |                              |\n|`U+116FC`  | _unassigned_     |                   |                            |                              |\n|`U+116FD`  | _unassigned_     |                   |                            |                              |\n|`U+116FE`  | _unassigned_     |                   |                            |                              |\n|`U+116FF`  | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Myanmar script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Myanmar text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation of a conjunct\nfrom a \"_Consonant_,Halant,_Consonant_\" sequence. The sequence\n\"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of a\nconjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The sequence\n\"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the first\nconsonant in its standard form, followed by an explicit \"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-nko.md",
    "content": "# N'Ko character tables #\n\nThis document lists the per-character shaping information needed to\n[shape N'Ko text](../opentype-shaping-nko.md).\n\n**Contents**\n\n  - [NKo character table](#nko-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## NKo character table ##\n\nN'Ko glyphs should be classified as in the following\ntable. Codepoints in the NKo block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nThe _Joining type_ column indicates whether each codepoint is defined\nas joining with adjacent characters on the left side, right side, left\nand right sides (\"DUAL\"), or neither side (\"NON_JOINING\"). Codepoints\ndesignated TRANSPARENT in the _Joining type_ column do not join with\nadjacent characters and, in addition, do not affect the joining\nbehavior of surrounding characters. Non-spacing marks are of type\nTRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent\ncharacters to join.\n\nThe _Joining group_ column lists the fundamental letter that the\nlisted codepoint behaves like for joining purposes.\n\nAssigned codepoints with a _null_ in the _Joining group_\ncolumn evoke no special behavior from the shaping engine during the\njoin-computation stage.\n\n> Note: No codepoints in the NKo block are assigned a non-null _Joining group_.\n\nThe _Mark class_ column indicates the Canonical Combining Class\nfor the codepoint.  Marks are assigned non-zero combining classes so\nthat sequences of adjacent marks can be reordered as required by the\northography. \n\n\n:::{table} N'Ko character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+07C0`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C0; Digit Zero                           |\n|`U+07C1`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C1; Digit One                            |\n|`U+07C2`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C2; Digit Two                            |\n|`U+07C3`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C3; Digit Three                          |\n|`U+07C4`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C4; Digit Four                           |\n|`U+07C5`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C5; Digit Five                           |\n|`U+07C6`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C6; Digit Six                            |\n|`U+07C7`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C7; Digit Seven                          |\n|`U+07C8`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C8; Digit Eight                          |\n|`U+07C9`   | Number           | NON_JOINING  | _null_               | _0_        | &#x07C9; Digit Nine                           |\n|`U+07CA`   | Letter           | DUAL         | _null_               | _0_        | &#x07CA; A                                    |\n|`U+07CB`   | Letter           | DUAL         | _null_               | _0_        | &#x07CB; Ee                                   |\n|`U+07CC`   | Letter           | DUAL         | _null_               | _0_        | &#x07CC; I                                    |\n|`U+07CD`   | Letter           | DUAL         | _null_               | _0_        | &#x07CD; E                                    |\n|`U+07CE`   | Letter           | DUAL         | _null_               | _0_        | &#x07CE; U                                    |\n|`U+07CF`   | Letter           | DUAL         | _null_               | _0_        | &#x07CF; Oo                                   |\n| | | | | |                                                                                                      \t\t\t      \n|`U+07D0`   | Letter           | DUAL         | _null_               | _0_        | &#x07D0; O                                    |\n|`U+07D1`   | Letter           | DUAL         | _null_               | _0_        | &#x07D1; Dagbasinna                           |\n|`U+07D2`   | Letter           | DUAL         | _null_               | _0_        | &#x07D2; N                                    |\n|`U+07D3`   | Letter           | DUAL         | _null_               | _0_        | &#x07D3; Ba                                   |\n|`U+07D4`   | Letter           | DUAL         | _null_               | _0_        | &#x07D4; Pa                                   |\n|`U+07D5`   | Letter           | DUAL         | _null_               | _0_        | &#x07D5; Ta                                   |\n|`U+07D6`   | Letter           | DUAL         | _null_               | _0_        | &#x07D6; Ja                                   |\n|`U+07D7`   | Letter           | DUAL         | _null_               | _0_        | &#x07D7; Cha                                  |\n|`U+07D8`   | Letter           | DUAL         | _null_               | _0_        | &#x07D8; Da                                   |\n|`U+07D9`   | Letter           | DUAL         | _null_               | _0_        | &#x07D9; Ra                                   |\n|`U+07DA`   | Letter           | DUAL         | _null_               | _0_        | &#x07DA; Rra                                  |\n|`U+07DB`   | Letter           | DUAL         | _null_               | _0_        | &#x07DB; Sa                                   |\n|`U+07DC`   | Letter           | DUAL         | _null_               | _0_        | &#x07DC; Gba                                  |\n|`U+07DD`   | Letter           | DUAL         | _null_               | _0_        | &#x07DD; Fa                                   |\n|`U+07DE`   | Letter           | DUAL         | _null_               | _0_        | &#x07DE; Ka                                   |\n|`U+07DF`   | Letter           | DUAL         | _null_               | _0_        | &#x07DF; La                                   |\n| | | | | |                                                                                                      \t\t\t      \n|`U+07E0`   | Letter           | DUAL         | _null_               | _0_        | &#x07E0; Na Woloso                            |\n|`U+07E1`   | Letter           | DUAL         | _null_               | _0_        | &#x07E1; Ma                                   |\n|`U+07E2`   | Letter           | DUAL         | _null_               | _0_        | &#x07E2; Nya                                  |\n|`U+07E3`   | Letter           | DUAL         | _null_               | _0_        | &#x07E3; Na                                   |\n|`U+07E4`   | Letter           | DUAL         | _null_               | _0_        | &#x07E4; Ha                                   |\n|`U+07E5`   | Letter           | DUAL         | _null_               | _0_        | &#x07E5; Wa                                   |\n|`U+07E6`   | Letter           | DUAL         | _null_               | _0_        | &#x07E6; Ya                                   |\n|`U+07E7`   | Letter           | DUAL         | _null_               | _0_        | &#x07E7; Nya Woloso                           |\n|`U+07E8`   | Letter           | DUAL         | _null_               | _0_        | &#x07E8; Jona Ja                              |\n|`U+07E9`   | Letter           | DUAL         | _null_               | _0_        | &#x07E9; Jona Cha                             |\n|`U+07EA`   | Letter           | DUAL         | _null_               | _0_        | &#x07EA; Jona Ra                              |\n|`U+07EB`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07EB; Combining  Short High Tone           |\n|`U+07EC`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07EC; Combining  Short Low Tone            |\n|`U+07ED`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07ED; Combining  Short Rising Tone         |\n|`U+07EE`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07EE; Combining  Long Descending Tone      |\n|`U+07EF`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07EF; Combining  Long High Tone            |\n| | | | | |                                                                                                     \t\t\t      \n|`U+07F0`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07F0; Combining  Long Low Tone             |\n|`U+07F1`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07F1; Combining  Long Rising Tone          |\n|`U+07F2`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x07F2; Combining  Nasalization Mark         |\n|`U+07F3`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x07F3; Combining  Double Dot Above          |\n|`U+07F4`   | Letter modifier  | NON_JOINING  | _null_               | _0_        | &#x07F4; High Tone Apostrophe                 |\n|`U+07F5`   | Letter modifier  | NON_JOINING  | _null_               | _0_        | &#x07F5; Low Tone Apostrophe                  |\n|`U+07F6`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x07F6; Symbol Oo Dennen                     |\n|`U+07F7`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x07F7; Symbol Gbakurunen                    |\n|`U+07F8`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x07F8; Comma                                |\n|`U+07F9`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x07F9; Exclamation Mark                     |\n|`U+07FA`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x07FA; Lajanyalan                           |\n|`U+07FB`   | _unassigned_     |              |                      |            |                                               |\n|`U+07FC`   | _unassigned_     |              |                      |            |                                               |\n|`U+07FD`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x07FD; Dantalayan                           |\n|`U+07FE`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x07FE; Dorome Sign                          |\n|`U+07FF`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x07FF; Taman Sign                           |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof N'Ko text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text syllables may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+00A0`   | Separator        | NON_JOINING  | _null_               | _0_        | &#x00A0; No-break space        |\n|`U+034F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x034F; Combining grapheme joiner |\n|`U+200C`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOIN_CAUSING | _null_               | _0_        | &#x200D; Zero-width joiner     |\n|`U+200E`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200E; Left-to-Right marker  |\n|`U+200F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200F; Right-to-Left marker  |\n|`U+2010`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x25CC; Dotted circle         |\n:::\n\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n<!--- combining grapheme joiner explanation --->\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence \"_Letter_,ZWJ\"\nwould be used. To show the medial form of a letter in isolation, the\nsequence \"ZWJ,_Letter_,ZWJ\" would be used.\n\n\n<!--- Zero-Width Non Joiner explanation --->\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes.\n\n\n<!--- How shaping is affected by the <abbr title=\"Left-To-Right\">LTR</abbr> and <abbr title=\"Right-To-Left\">RTL</abbr> markers explanation --->\n\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and \"Hamza\") in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n"
  },
  {
    "path": "character-tables/character-tables-oriya.md",
    "content": "# Oriya character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Oriya text](../opentype-shaping-oriya.md).\n\n**Contents**\n\n  - [Oriya character table](#oriya-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Oriya character table ##\n\nOriya glyphs should be classified as in the following\ntable. Codepoints in the Oriya block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Oriya character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0B00`   | _unassigned_     |                   |                            |                              |\n|`U+0B01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0B01; Candrabindu         |\n|`U+0B02`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0B02; Anusvara            |\n|`U+0B03`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0B03; Visarga             |\n|`U+0B04`   | _unassigned_     |                   |                            |                              |\n|`U+0B05`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B05; A                   |\n|`U+0B06`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B06; Aa                  |\n|`U+0B07`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B07; I                   |\n|`U+0B08`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B08; Ii                  |\n|`U+0B09`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B09; U                   |\n|`U+0B0A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B0A; Uu                  |\n|`U+0B0B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B0B; Vocalic R           |\n|`U+0B0C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B0C; Vocalic L           |\n|`U+0B0D`   | _unassigned_     |                   |                            |                              |\n|`U+0B0E`   | _unassigned_     |                   |                            |                              |\n|`U+0B0F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B0F; E                   |\n| | | | |\n|`U+0B10`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B10; Ai                  |\n|`U+0B11`   | _unassigned_     |                   |                            |                              |\n|`U+0B12`   | _unassigned_     |                   |                            |                              |\n|`U+0B13`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B13; O                   |\n|`U+0B14`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B14; Au                  |\n|`U+0B15`   | Letter           | CONSONANT         | _null_                     | &#x0B15; Ka                  |\n|`U+0B16`   | Letter           | CONSONANT         | _null_                     | &#x0B16; Kha                 |\n|`U+0B17`   | Letter           | CONSONANT         | _null_                     | &#x0B17; Ga                  |\n|`U+0B18`   | Letter           | CONSONANT         | _null_                     | &#x0B18; Gha                 |\n|`U+0B19`   | Letter           | CONSONANT         | _null_                     | &#x0B19; Nga                 |\n|`U+0B1A`   | Letter           | CONSONANT         | _null_                     | &#x0B1A; Ca                  |\n|`U+0B1B`   | Letter           | CONSONANT         | _null_                     | &#x0B1B; Cha                 |\n|`U+0B1C`   | Letter           | CONSONANT         | _null_                     | &#x0B1C; Ja                  |\n|`U+0B1D`   | Letter           | CONSONANT         | _null_                     | &#x0B1D; Jha                 |\n|`U+0B1E`   | Letter           | CONSONANT         | _null_                     | &#x0B1E; Nya                 |\n|`U+0B1F`   | Letter           | CONSONANT         | _null_                     | &#x0B1F; Tta                 |\n| | | | |\n|`U+0B20`   | Letter           | CONSONANT         | _null_                     | &#x0B20; Ttha                |\n|`U+0B21`   | Letter           | CONSONANT         | _null_                     | &#x0B21; Dda                 |\n|`U+0B22`   | Letter           | CONSONANT         | _null_                     | &#x0B22; Ddha                |\n|`U+0B23`   | Letter           | CONSONANT         | _null_                     | &#x0B23; Nna                 |\n|`U+0B24`   | Letter           | CONSONANT         | _null_                     | &#x0B24; Ta                  |\n|`U+0B25`   | Letter           | CONSONANT         | _null_                     | &#x0B25; Tha                 |\n|`U+0B26`   | Letter           | CONSONANT         | _null_                     | &#x0B26; Da                  |\n|`U+0B27`   | Letter           | CONSONANT         | _null_                     | &#x0B27; Dha                 |\n|`U+0B28`   | Letter           | CONSONANT         | _null_                     | &#x0B28; Na                  |\n|`U+0B29`   | _unassigned_     |                   |                            |                              |\n|`U+0B2A`   | Letter           | CONSONANT         | _null_                     | &#x0B2A; Pa                  |\n|`U+0B2B`   | Letter           | CONSONANT         | _null_                     | &#x0B2B; Pha                 |\n|`U+0B2C`   | Letter           | CONSONANT         | _null_                     | &#x0B2C; Ba                  |\n|`U+0B2D`   | Letter           | CONSONANT         | _null_                     | &#x0B2D; Bha                 |\n|`U+0B2E`   | Letter           | CONSONANT         | _null_                     | &#x0B2E; Ma                  |\n|`U+0B2F`   | Letter           | CONSONANT         | _null_                     | &#x0B2F; Ya                  |\n| | | | |\n|`U+0B30`   | Letter           | CONSONANT         | _null_                     | &#x0B30; Ra                  |\n|`U+0B31`   | _unassigned_     |                   |                            |                              |\n|`U+0B32`   | Letter           | CONSONANT         | _null_                     | &#x0B32; La                  |\n|`U+0B33`   | Letter           | CONSONANT         | _null_                     | &#x0B33; Lla                 |\n|`U+0B34`   | _unassigned_     |                   |                            |                              |\n|`U+0B35`   | Letter           | CONSONANT         | _null_                     | &#x0B35; Va                  |\n|`U+0B36`   | Letter           | CONSONANT         | _null_                     | &#x0B36; Sha                 |\n|`U+0B37`   | Letter           | CONSONANT         | _null_                     | &#x0B37; Ssa                 |\n|`U+0B38`   | Letter           | CONSONANT         | _null_                     | &#x0B38; Sa                  |\n|`U+0B39`   | Letter           | CONSONANT         | _null_                     | &#x0B39; Ha                  |\n|`U+0B3A`   | _unassigned_     |                   |                            |                              |\n|`U+0B3B`   | _unassigned_     |                   |                            |                              |\n|`U+0B3C`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x0B3C; Nukta               |\n|`U+0B3D`   | Letter           | AVAGRAHA          | _null_                     | &#x0B3D; Avagraha            |\n|`U+0B3E`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0B3E; Sign Aa             |\n|`U+0B3F`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0B3F; Sign I              |\n| | | | |\n|`U+0B40`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0B40; Sign Ii             |\n|`U+0B41`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B41; Sign U              |\n|`U+0B42`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B42; Sign Uu             |\n|`U+0B43`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B43; Sign Vocalic R      |\n|`U+0B44`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B44; Sign Vocalic Rr     |\n|`U+0B45`   | _unassigned_     |                   |                            |                              |\n|`U+0B46`   | _unassigned_     |                   |                            |                              |\n|`U+0B47`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0B47; Sign E              |\n|`U+0B48`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_LEFT_POSITION      | &#x0B48; Sign Ai             |\n|`U+0B49`   | _unassigned_     |                   |                            |                              |\n|`U+0B4A`   | _unassigned_     |                   |                            |                              |\n|`U+0B4B`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0B4B; Sign O              |\n|`U+0B4C`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_LEFT_AND_RIGHT_POSITION| &#x0B4C; Sign Au             |\n|`U+0B4D`   | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x0B4D; Virama              |\n|`U+0B4E`   | _unassigned_     |                   |                            |                              |\n|`U+0B4F`   | _unassigned_     |                   |                            |                              |\n| | | | |\n|`U+0B50`   | _unassigned_     |                   |                            |                              |\n|`U+0B51`   | _unassigned_     |                   |                            |                              |\n|`U+0B52`   | _unassigned_     |                   |                            |                              |\n|`U+0B53`   | _unassigned_     |                   |                            |                              |\n|`U+0B54`   | _unassigned_     |                   |                            |                              |\n|`U+0B55`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0B55; Sign Overline       |\n|`U+0B56`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0B56; Ai Length Mark      |\n|`U+0B57`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_RIGHT_POSITION     | &#x0B57; Au Length Mark      |\n|`U+0B58`   | _unassigned_     |                   |                            |                              |\n|`U+0B59`   | _unassigned_     |                   |                            |                              |\n|`U+0B5A`   | _unassigned_     |                   |                            |                              |\n|`U+0B5B`   | _unassigned_     |                   |                            |                              |\n|`U+0B5C`   | Letter           | CONSONANT         | _null_                     | &#x0B5C; Rra                 |\n|`U+0B5D`   | Letter           | CONSONANT         | _null_                     | &#x0B5D; Rha                 |\n|`U+0B5E`   | _unassigned_     |                   |                            |                              |\n|`U+0B5F`   | Letter           | CONSONANT         | _null_                     | &#x0B5F; Yya                 |\n| | | | |\n|`U+0B60`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B60; Vocalic Rr          |\n|`U+0B61`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B61; Vocalic Ll          |\n|`U+0B62`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B62; Sign Vocalic L      |\n|`U+0B63`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0B63; Sign Vocalic Ll     |\n|`U+0B64`   | _unassigned_     |                   |                            |                              |\n|`U+0B65`   | _unassigned_     |                   |                            |                              |\n|`U+0B66`   | Number           | NUMBER            | _null_                     | &#x0B66; Digit Zero          |\n|`U+0B67`   | Number           | NUMBER            | _null_                     | &#x0B67; Digit One           |\n|`U+0B68`   | Number           | NUMBER            | _null_                     | &#x0B68; Digit Two           |\n|`U+0B69`   | Number           | NUMBER            | _null_                     | &#x0B69; Digit Three         |\n|`U+0B6A`   | Number           | NUMBER            | _null_                     | &#x0B6A; Digit Four          |\n|`U+0B6B`   | Number           | NUMBER            | _null_                     | &#x0B6B; Digit Five          |\n|`U+0B6C`   | Number           | NUMBER            | _null_                     | &#x0B6C; Digit Six           |\n|`U+0B6D`   | Number           | NUMBER            | _null_                     | &#x0B6D; Digit Seven         |\n|`U+0B6E`   | Number           | NUMBER            | _null_                     | &#x0B6E; Digit Eight         |\n|`U+0B6F`   | Number           | NUMBER            | _null_                     | &#x0B6F; Digit Nine          |\n| | | | |\n|`U+0B70`   | Symbol           | SYMBOL            | _null_                     | &#x0B70; Isshar              |\n|`U+0B71`   | Letter           | CONSONANT         | _null_                     | &#x0B71; Wa                  |\n|`U+0B72`   | Number           | NUMBER            | _null_                     | &#x0B72; Fraction 1/4        |\n|`U+0B73`   | Number           | NUMBER            | _null_                     | &#x0B73; Fraction 1/2        |\n|`U+0B74`   | Number           | NUMBER            | _null_                     | &#x0B74; Fraction 3/4        |\n|`U+0B75`   | Number           | NUMBER            | _null_                     | &#x0B75; Fraction 1/16       |\n|`U+0B76`   | Number           | NUMBER            | _null_                     | &#x0B76; Fraction 1/8        |\n|`U+0B77`   | Number           | NUMBER            | _null_                     | &#x0B77; Fraction 3/16       |\n|`U+0B78`   | _unassigned_     |                   |                            |                              |\n|`U+0B79`   | _unassigned_     |                   |                            |                              |\n|`U+0B7A`   | _unassigned_     |                   |                            |                              |\n|`U+0B7B`   | _unassigned_     |                   |                            |                              |\n|`U+0B7C`   | _unassigned_     |                   |                            |                              |\n|`U+0B7D`   | _unassigned_     |                   |                            |                              |\n|`U+0B7E`   | _unassigned_     |                   |                            |                              |\n|`U+0B7F`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Oriya script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Oriya text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Oriya text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Oriya text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-sinhala.md",
    "content": "# Sinhala character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Sinhala text](../opentype-shaping-sinhala.md).\n\n**Contents**\n\n  - [Sinhala character table](#sinhala-character-table)\n  - [Sinhala Archaic Numbers character table](#sinhala-archaic-numbers-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Sinhala character table ##\n\nSinhala glyphs should be classified as in the following\ntable. Codepoints in the Sinhala block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Sinhala character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0D80`   | _unassigned_     |                   |                            |                              |\n|`U+0D81`   | Mark [Mn]  _     | BINDU             | TOP_POSITION               | &#x0D81; Candrabindu         |\n|`U+0D82`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0D82; Anusvara            |\n|`U+0D83`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0D83; Visarga             |\n|`U+0D84`   | _unassigned_     |                   |                            |                              |\n|`U+0D85`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D85; A                   |\n|`U+0D86`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D86; Aa                  |\n|`U+0D87`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D87; Ae                  |\n|`U+0D88`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D88; Aae                 |\n|`U+0D89`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D89; I                   |\n|`U+0D8A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8A; Ii                  |\n|`U+0D8B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8B; U                   |\n|`U+0D8C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8C; Uu                  |\n|`U+0D8D`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8D; Vocalic R           |\n|`U+0D8E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8E; Vocalic Rr          |\n|`U+0D8F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D8F; Vocalic L           |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0D90`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D90; Vocalic Ll          |\n|`U+0D91`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D91; E                   |\n|`U+0D92`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D92; Ee                  |\n|`U+0D93`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D93; Ai                  |\n|`U+0D94`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D94; O                   |\n|`U+0D95`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D95; Oo                  |\n|`U+0D96`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0D96; Au                  |\n|`U+0D97`   | _unassigned_     |                   |                            |                              |\n|`U+0D98`   | _unassigned_     |                   |                            |                              |\n|`U+0D99`   | _unassigned_     |                   |                            |                              |\n|`U+0D9A`   | Letter           | CONSONANT         | _null_                     | &#x0D9A; Ka                  |\n|`U+0D9B`   | Letter           | CONSONANT         | _null_                     | &#x0D9B; Kha                 |\n|`U+0D9C`   | Letter           | CONSONANT         | _null_                     | &#x0D9C; Ga                  |\n|`U+0D9D`   | Letter           | CONSONANT         | _null_                     | &#x0D9D; Gha                 |\n|`U+0D9E`   | Letter           | CONSONANT         | _null_                     | &#x0D9E; Nga                 |\n|`U+0D9F`   | Letter           | CONSONANT         | _null_                     | &#x0D9F; Nnga                |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0DA0`   | Letter           | CONSONANT         | _null_                     | &#x0DA0; Ca                  |\n|`U+0DA1`   | Letter           | CONSONANT         | _null_                     | &#x0DA1; Cha                 |\n|`U+0DA2`   | Letter           | CONSONANT         | _null_                     | &#x0DA2; Ja                  |\n|`U+0DA3`   | Letter           | CONSONANT         | _null_                     | &#x0DA3; Jha                 |\n|`U+0DA4`   | Letter           | CONSONANT         | _null_                     | &#x0DA4; Nya                 |\n|`U+0DA5`   | Letter           | CONSONANT         | _null_                     | &#x0DA5; Jnya                |\n|`U+0DA6`   | Letter           | CONSONANT         | _null_                     | &#x0DA6; Nyja                |\n|`U+0DA7`   | Letter           | CONSONANT         | _null_                     | &#x0DA7; Tta                 |\n|`U+0DA8`   | Letter           | CONSONANT         | _null_                     | &#x0DA8; Ttha                |\n|`U+0DA9`   | Letter           | CONSONANT         | _null_                     | &#x0DA9; Dda                 |\n|`U+0DAA`   | Letter           | CONSONANT         | _null_                     | &#x0DAA; Ddha                |\n|`U+0DAB`   | Letter           | CONSONANT         | _null_                     | &#x0DAB; Nna                 |\n|`U+0DAC`   | Letter           | CONSONANT         | _null_                     | &#x0DAC; Nndda               |\n|`U+0DAD`   | Letter           | CONSONANT         | _null_                     | &#x0DAD; Ta                  |\n|`U+0DAE`   | Letter           | CONSONANT         | _null_                     | &#x0DAE; Tha                 |\n|`U+0DAF`   | Letter           | CONSONANT         | _null_                     | &#x0DAF; Da                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0DB0`   | Letter           | CONSONANT         | _null_                     | &#x0DB0; Dha                 |\n|`U+0DB1`   | Letter           | CONSONANT         | _null_                     | &#x0DB1; Na                  |\n|`U+0DB2`   | _unassigned_     |                   |                            |                              |\n|`U+0DB3`   | Letter           | CONSONANT         | _null_                     | &#x0DB3; Nda                 |\n|`U+0DB4`   | Letter           | CONSONANT         | _null_                     | &#x0DB4; Pa                  |\n|`U+0DB5`   | Letter           | CONSONANT         | _null_                     | &#x0DB5; Pha                 |\n|`U+0DB6`   | Letter           | CONSONANT         | _null_                     | &#x0DB6; Ba                  |\n|`U+0DB7`   | Letter           | CONSONANT         | _null_                     | &#x0DB7; Bha                 |\n|`U+0DB8`   | Letter           | CONSONANT         | _null_                     | &#x0DB8; Ma                  |\n|`U+0DB9`   | Letter           | CONSONANT         | _null_                     | &#x0DB9; Mba                 |\n|`U+0DBA`   | Letter           | CONSONANT         | _null_                     | &#x0DBA; Ya                  |\n|`U+0DBB`   | Letter           | CONSONANT         | _null_                     | &#x0DBB; Ra                  |\n|`U+0DBC`   | _unassigned_     |                   |                            |                              |\n|`U+0DBD`   | Letter           | CONSONANT         | _null_                     | &#x0DBD; La                  |\n|`U+0DBE`   | _unassigned_     |                   |                            |                              |\n|`U+0DBF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0DC0`   | Letter           | CONSONANT         | _null_                     | &#x0DC0; Va                  |\n|`U+0DC1`   | Letter           | CONSONANT         | _null_                     | &#x0DC1; Sha                 |\n|`U+0DC2`   | Letter           | CONSONANT         | _null_                     | &#x0DC2; Ssa                 |\n|`U+0DC3`   | Letter           | CONSONANT         | _null_                     | &#x0DC3; Sa                  |\n|`U+0DC4`   | Letter           | CONSONANT         | _null_                     | &#x0DC4; Ha                  |\n|`U+0DC5`   | Letter           | CONSONANT         | _null_                     | &#x0DC5; Lla                 |\n|`U+0DC6`   | Letter           | CONSONANT         | _null_                     | &#x0DC6; Fa                  |\n|`U+0DC7`   | _unassigned_     |                   |                            |                              |\n|`U+0DC8`   | _unassigned_     |                   |                            |                              |\n|`U+0DC9`   | _unassigned_     |                   |                            |                              |\n|`U+0DCA`   | Mark [MN]        | VIRAMA            | TOP_POSITION               | &#x0DCA; Virama              |\n|`U+0DCB`   | _unassigned_     |                   |                            |                              |\n|`U+0DCC`   | _unassigned_     |                   |                            |                              |\n|`U+0DCD`   | _unassigned_     |                   |                            |                              |\n|`U+0DCE`   | _unassigned_     |                   |                            |                              |\n|`U+0DCF`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DCF; Sign Aa             |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\n|`U+0DD0`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DD0; Sign Ae             |\n|`U+0DD1`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DD1; Sign Aae            |\n|`U+0DD2`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0DD2; Sign I              |\n|`U+0DD3`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0DD3; Sign Ii             |\n|`U+0DD4`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0DD4; Sign U              |\n|`U+0DD5`   | _unassigned_     |                   |                            |                              |\n|`U+0DD6`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0DD6; Sign Uu             |\n|`U+0DD7`   | _unassigned_     |                   |                            |                              |\n|`U+0DD8`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DD8; Sign Vocalic R      |\n|`U+0DD9`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0DD9; Sign E              |\n|`U+0DDA`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_AND_LEFT_POSITION      | &#x0DDA; Sign Ee             |\n|`U+0DDB`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0DDB; Sign Ai             |\n|`U+0DDC`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0DDC; Sign O              |\n|`U+0DDD`   | Mark [Mc]        | VOWEL_DEPENDENT   | TOP_LEFT_AND_RIGHT_POSITION| &#x0DDD; Sign Oo             |\n|`U+0DDE`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0DDE; Sign Au             |\n|`U+0DDF`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DDF; Sign Vocalic L      |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0DE0`   | _unassigned_     |                   |                            |                              |\n|`U+0DE1`   | _unassigned_     |                   |                            |                              |\n|`U+0DE2`   | _unassigned_     |                   |                            |                              |\n|`U+0DE3`   | _unassigned_     |                   |                            |                              |\n|`U+0DE4`   | _unassigned_     |                   |                            |                              |\n|`U+0DE5`   | _unassigned_     |                   |                            |                              |\n|`U+0DE6`   | Number           | NUMBER            | _null_                     | &#x0DE6; Digit Zero          |\n|`U+0DE7`   | Number           | NUMBER            | _null_                     | &#x0DE7; Digit One           |\n|`U+0DE8`   | Number           | NUMBER            | _null_                     | &#x0DE8; Digit Two           |\n|`U+0DE9`   | Number           | NUMBER            | _null_                     | &#x0DE9; Digit Three         |\n|`U+0DEA`   | Number           | NUMBER            | _null_                     | &#x0DEA; Digit Four          |\n|`U+0DEB`   | Number           | NUMBER            | _null_                     | &#x0DEB; Digit Five          |\n|`U+0DEC`   | Number           | NUMBER            | _null_                     | &#x0DEC; Digit Six           |\n|`U+0DED`   | Number           | NUMBER            | _null_                     | &#x0DED; Digit Seven         |\n|`U+0DEE`   | Number           | NUMBER            | _null_                     | &#x0DEE; Digit Eight         |\n|`U+0DEF`   | Number           | NUMBER            | _null_                     | &#x0DEF; Digit Nine          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0DF0`   | _unassigned_     |                   |                            |                              |\n|`U+0DF1`   | _unassigned_     |                   |                            |                              |\n|`U+0DF2`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DF2; Sign Vocalic Rr     |\n|`U+0DF3`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0DF3; Sign Vocalic Ll     |\n|`U+0DF4`   | Punctuation      | _null_            | _null_                     | &#x0DF4; Kunddaliya          |\n|`U+0DF5`   | _unassigned_     |                   |                            |                              |\n|`U+0DF6`   | _unassigned_     |                   |                            |                              |\n|`U+0DF7`   | _unassigned_     |                   |                            |                              |\n|`U+0DF8`   | _unassigned_     |                   |                            |                              |\n|`U+0DF9`   | _unassigned_     |                   |                            |                              |\n|`U+0DFA`   | _unassigned_     |                   |                            |                              |\n|`U+0DFB`   | _unassigned_     |                   |                            |                              |\n|`U+0DFC`   | _unassigned_     |                   |                            |                              |\n|`U+0DFD`   | _unassigned_     |                   |                            |                              |\n|`U+0DFE`   | _unassigned_     |                   |                            |                              |\n|`U+0DFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Sinhala Archaic Numbers character table ##\n\nSinhala text runs may also include glyphs from the Sinhala Archaic\nNumbers block. These characters should be classified as follows.\n\n\n:::{table} Sinhala Archaic Numbers character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+111E0`  | _unassigned_     |                   |                            |                              |\n|`U+111E1`  | Number           | NUMBER            | _null_                     | &#x111E1; Archaic Digit One  |\n|`U+111E2`  | Number           | NUMBER            | _null_                     | &#x111E2; Archaic Digit Two  |\n|`U+111E3`  | Number           | NUMBER            | _null_                     | &#x111E3; Archaic Digit Three|\n|`U+111E4`  | Number           | NUMBER            | _null_                     | &#x111E4; Archaic Digit Four |\n|`U+111E5`  | Number           | NUMBER            | _null_                     | &#x111E5; Archaic Digit Five |\n|`U+111E6`  | Number           | NUMBER            | _null_                     | &#x111E6; Archaic Digit Six  |\n|`U+111E7`  | Number           | NUMBER            | _null_                     | &#x111E7; Archaic Digit Seven|\n|`U+111E8`  | Number           | NUMBER            | _null_                     | &#x111E8; Archaic Digit Eight|\n|`U+111E9`  | Number           | NUMBER            | _null_                     | &#x111E9; Archaic Digit Nine |\n|`U+111EA`  | Number           | NUMBER            | _null_                     | &#x111EA; Archaic Number Ten |\n|`U+111EB`  | Number           | NUMBER            | _null_                     | &#x111EB; Archaic Number 20  |\n|`U+111EC`  | Number           | NUMBER            | _null_                     | &#x111EC; Archaic Number 30  |\n|`U+111ED`  | Number           | NUMBER            | _null_                     | &#x111ED; Archaic Number 40  |\n|`U+111EE`  | Number           | NUMBER            | _null_                     | &#x111EE; Archaic Number 50  |\n|`U+111EF`  | Number           | NUMBER            | _null_                     | &#x111EF; Archaic Number 60  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+111F0`  | Number           | NUMBER            | _null_                     | &#x111F0; Archaic Number 70  |\n|`U+111F1`  | Number           | NUMBER            | _null_                     | &#x111F1; Archaic Number 80  |\n|`U+111F2`  | Number           | NUMBER            | _null_                     | &#x111F2; Archaic Number 90  |\n|`U+111F3`  | Number           | NUMBER            | _null_                     | &#x111F3; Archaic Number 100 |\n|`U+111F4`  | Number           | NUMBER            | _null_                     | &#x111F4; Archaic Number 1000|\n|`U+111F5`  | _unassigned_     |                   |                            |                              |\n|`U+111F6`  | _unassigned_     |                   |                            |                              |\n|`U+111F7`  | _unassigned_     |                   |                            |                              |\n|`U+111F8`  | _unassigned_     |                   |                            |                              |\n|`U+111F9`  | _unassigned_     |                   |                            |                              |\n|`U+111FA`  | _unassigned_     |                   |                            |                              |\n|`U+111FB`  | _unassigned_     |                   |                            |                              |\n|`U+111FC`  | _unassigned_     |                   |                            |                              |\n|`U+111FD`  | _unassigned_     |                   |                            |                              |\n|`U+111FE`  | _unassigned_     |                   |                            |                              |\n|`U+111FF`  | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Sinhala script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Sinhala text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is used to request the subjoined form\nof a consonant. The sequence \"Consonant_1,Halant,ZWJ,Consonant_2\" is\nused to specify the subjoined form of \"Consonant_2\".\n\nA secondary usage of the zero-width joiner is to explicitly request\nthe formation of \"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should\nproduce a \"Reph\".\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is not used in shaping runs of\nSinhala text. The <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> is referenced below in various regular\nexpressions and shaping rules, however, because it is used by other\nIndic scripts.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-syriac.md",
    "content": "# Syriac character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Syriac text](../opentype-shaping-syriac.md).\n\n**Contents**\n\n  - [Syriac character table](#syriac-character-table)\n  - [Syriac Supplement character table](#syriac-supplement-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Syriac character table ##\n\nSyriac glyphs should be classified as in the following\ntable. Codepoints in the Syriac block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nThe _Joining type_ column indicates whether each codepoint is defined\nas joining with adjacent characters on the left side, right side, left\nand right sides (\"DUAL\"), or neither side (\"NON_JOINING\"). Codepoints\ndesignated TRANSPARENT in the _Joining type_ column do not join with\nadjacent characters and, in addition, do not affect the joining\nbehavior of surrounding characters. Non-spacing marks are of type\nTRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent\ncharacters to join.\n\nThe _Joining group_ column lists the fundamental letter that the\nlisted codepoint behaves like for joining purposes.\n\nAssigned codepoints with a _null_ in the _Joining group_\ncolumn evoke no special behavior from the shaping engine during the\njoin-computation stage.\n\nThe _Mark class_ column indicates the Canonical Combining Class\nfor the codepoint.  Marks are assigned non-zero combining classes so\nthat sequences of adjacent marks can be reordered as required by the\northography. \n\nFor Syriac, a subset of marks in the 220 and 230 classes are also\ndesignated _Modifier Combining Marks_ (<abbr>MCM</abbr>). These are denoted with\n_220_MCM_ and _230_MCM_ in the _Mark class_ column. The <abbr title=\"Modifier Combining Mark\">MCM</abbr> marks are\ntreated differently during the mark-reordering stage.\n\n\n:::{table} Syriac character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+0700`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0700; End of Paragraph                     |\n|`U+0701`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0701; Supralinear Full Stop                |\n|`U+0702`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0702; Sublinear Full Stop                  |\n|`U+0703`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0703; Supralinear Colon                    |\n|`U+0704`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0704; Sublinear Colon                      |\n|`U+0705`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0705; Horizontal Colon                     |\n|`U+0706`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0706; Colon Skewed Left                    |\n|`U+0707`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0707; Colon Skewed Right                   |\n|`U+0708`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0708; Supralinear Colon Skewed Left        |\n|`U+0709`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x0709; Sublinear Colon Skewed Right         |\n|`U+070A`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x070A; Contraction                          |\n|`U+070B`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x070B; Harklean Obelus                      |\n|`U+070C`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x070C; Harklean Metobelus                   |\n|`U+070D`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x070D; Harklean Asteriscus                  |\n|`U+070E`   | _unassigned_     |              |                      |            |                                               |\n|`U+070F`   | Other            | TRANSPARENT  | _null_               | _0_        | &#x070F; Syriac Abbreviation Mark             |\n| | | | | |                                                                                                                      \n|`U+0710`   | Letter           | RIGHT        | ALAPH                | _0_        | &#x0710; Alaph                                |\n|`U+0711`   | Mark [Mn]        | TRANSPARENT  | _null_               | 36         | &#x0711; Superscript Alaph                    |\n|`U+0712`   | Letter           | DUAL         | BETH                 | _0_        | &#x0712; Beth                                 |\n|`U+0713`   | Letter           | DUAL         | GAMAL                | _0_        | &#x0713; Gamal                                |\n|`U+0714`   | Letter           | DUAL         | GAMAL                | _0_        | &#x0714; Gamal Garshuni                       |\n|`U+0715`   | Letter           | RIGHT        | DALATH_RISH          | _0_        | &#x0715; Dalath                               |\n|`U+0716`   | Letter           | RIGHT        | DALATH_RISH          | _0_        | &#x0716; Dotless Dalath Rish                  |\n|`U+0717`   | Letter           | RIGHT        | HE                   | _0_        | &#x0717; He                                   |\n|`U+0718`   | Letter           | RIGHT        | SYRIAC_WAW           | _0_        | &#x0718; Waw                                  |\n|`U+0719`   | Letter           | RIGHT        | ZAIN                 | _0_        | &#x0719; Zain                                 |\n|`U+071A`   | Letter           | DUAL         | HETH                 | _0_        | &#x071A; Heth                                 |\n|`U+071B`   | Letter           | DUAL         | TETH                 | _0_        | &#x071B; Teth                                 |\n|`U+071C`   | Letter           | DUAL         | TETH                 | _0_        | &#x071C; Teth Garshuni                        |\n|`U+071D`   | Letter           | DUAL         | YUDH                 | _0_        | &#x071D; Yudh                                 |\n|`U+071E`   | Letter           | RIGHT        | YUDH_HE              | _0_        | &#x071E; Yudh He                              |\n|`U+071F`   | Letter           | DUAL         | KAPH                 | _0_        | &#x071F; Kaph                                 |\n| | | | | |                                                                                                                      \n|`U+0720`   | Letter           | DUAL         | LAMADH               | _0_        | &#x0720; Lamadh                               |\n|`U+0721`   | Letter           | DUAL         | MIM                  | _0_        | &#x0721; Mim                                  |\n|`U+0722`   | Letter           | DUAL         | NUN                  | _0_        | &#x0722; Nun                                  |\n|`U+0723`   | Letter           | DUAL         | SEMKATH              | _0_        | &#x0723; Semkath                              |\n|`U+0724`   | Letter           | DUAL         | FINAL_SEMKATH        | _0_        | &#x0724; Final Semkath                        |\n|`U+0725`   | Letter           | DUAL         | E                    | _0_        | &#x0725; E                                    |\n|`U+0727`   | Letter           | DUAL         | PE                   | _0_        | &#x0727; Pe                                   |\n|`U+0727`   | Letter           | DUAL         | REVERSED_PE          | _0_        | &#x0727; Reversed Pe                          |\n|`U+0728`   | Letter           | RIGHT        | SADHE                | _0_        | &#x0728; Sadhe                                |\n|`U+0729`   | Letter           | DUAL         | QAPH                 | _0_        | &#x0729; Qaph                                 |\n|`U+072A`   | Letter           | RIGHT        | DALATH_RISH          | _0_        | &#x072A; Rish                                 |\n|`U+072B`   | Letter           | DUAL         | SHIN                 | _0_        | &#x072B; Shin                                 |\n|`U+072C`   | Letter           | RIGHT        | TAW                  | _0_        | &#x072C; Taw                                  |\n|`U+072D`   | Letter           | DUAL         | BETH                 | _0_        | &#x072D; Persian Bheth                        |\n|`U+072E`   | Letter           | DUAL         | GAMAL                | _0_        | &#x072E; Persian Ghamal                       |\n|`U+072F`   | Letter           | RIGHT        | DALATH_RISH          | _0_        | &#x072F; Persian Dhalath                      |\n| | | | | |                                                                                                                      \n|`U+0730`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0730; Pthaha Above                         |\n|`U+0731`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0731; Pthaha Below                         |\n|`U+0732`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0732; Pthaha Dotted                        |\n|`U+0733`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0733; Zqapha Above                         |\n|`U+0734`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0734; Zqapha Below                         |\n|`U+0735`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0735; Zqapha Dotted                        |\n|`U+0736`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0736; Rbasa Above                          |\n|`U+0737`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0737; Rbasa Below                          |\n|`U+0738`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0738; Dotted Zlama Horizontal              |\n|`U+0739`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0739; Dotted Zlama Angular                 |\n|`U+073A`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x073A; Hbasa Above                          |\n|`U+073B`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x073B; Hbasa Below                          |\n|`U+073C`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x073C; Hbasa-Esasa Dotted                   |\n|`U+073D`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x073D; Esasa Above                          |\n|`U+073E`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x073E; Esasa Below                          |\n|`U+073F`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x073F; Rwaha                                |\n| | | | | |                                                                                                                      \n|`U+0740`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0740; Feminine Dot                         |\n|`U+0741`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0741; Qushshaya                            |\n|`U+0742`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0742; Rukkakha                             |\n|`U+0743`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0743; Two Vertical Dots Above              |\n|`U+0744`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0744; Two Vertical Dots Below              |\n|`U+0745`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0745; Three Dots Above                     |\n|`U+0746`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0746; Three Dots Below                     |\n|`U+0747`   | Mark [Mn]        | TRANSPARENT  | _null_               | 220        | &#x0747; Oblique Line Above                   |\n|`U+0748`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0748; Oblique Line Below                   |\n|`U+0749`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x0749; Music                                |\n|`U+074A`   | Mark [Mn]        | TRANSPARENT  | _null_               | 230        | &#x074A; Barrekh                              |\n|`U+074B`   | _unassigned_     |              |                      |            |                                               |\n|`U+074C`   | _unassigned_     |              |                      |            |                                               |\n|`U+074D`   | Letter           | RIGHT        | ZHAIN                | _0_        | &#x074D; Sogdian Zhain                        |\n|`U+074E`   | Letter           | DUAL         | KHAPH                | _0_        | &#x074E; Sogdian Khaph                        |\n|`U+074F`   | Letter           | DUAL         | FE                   | _0_        | &#x074F; Sogdian Fe                           |\n:::\n\n\n\n## Syriac Supplement character table ##\n\nThe Syriac Supplement block includes letters needed to write Suriyani\nMalayalam, also known as Garshuni or Syriac Malayalam.\n\n:::{table} Syriac Supplement character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                                         |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|\n|`U+0860`   | Letter           | DUAL         | MALAYALAM_NGA        | _0_        | &#x0860; Malayalam Nga                        |\n|`U+0861`   | Letter           | NON_JOINING  | MALAYALAM_JA         | _0_        | &#x0861; Malayalam Ja                         |\n|`U+0862`   | Letter           | DUAL         | MALAYALAM_NYA        | _0_        | &#x0862; Malayalam Nya                        |\n|`U+0863`   | Letter           | DUAL         | MALAYALAM_TTA        | _0_        | &#x0863; Malayalam Tta                        |\n|`U+0864`   | Letter           | DUAL         | MALAYALAM_NNA        | _0_        | &#x0864; Malayalam Nna                        |\n|`U+0865`   | Letter           | DUAL         | MALAYALAM_NNNA       | _0_        | &#x0865; Malayalam Nnna                       |\n|`U+0866`   | Letter           | NON_JOINING  | MALAYALAM_BHA        | _0_        | &#x0866; Malayalam Bha                        |\n|`U+0867`   | Letter           | RIGHT        | MALAYALAM_RA         | _0_        | &#x0867; Malayalam Ra                         |\n|`U+0868`   | Letter           | DUAL         | MALAYALAM_LLA        | _0_        | &#x0868; Malayalam Lla                        |\n|`U+0869`   | Letter           | RIGHT        | MALAYALAM_LLLA       | _0_        | &#x0869; Malayalam Llla                       |\n|`U+086A`   | Letter           | RIGHT        | MALAYALAM_SSA        | _0_        | &#x086A; Malayalam Ssa                        |\n|`U+086B`   | _unassigned_     |              |                      |            |                                               |\n|`U+086C`   | _unassigned_     |              |                      |            |                                               |\n|`U+086D`   | _unassigned_     |              |                      |            |                                               |\n|`U+086E`   | _unassigned_     |              |                      |            |                                               |\n|`U+086F`   | _unassigned_     |              |                      |            |                                               |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Syriac text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text syllables may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\nIn addition, Syriac text runs may include the \"Tatweel\" or kashida\ncodepoint (`U+0640`) from the Arabic block, because the Syriac block\ndoes not encode a separate kashida character.\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Joining type | Joining group        | Mark class | Glyph                          |\n|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|\n|`U+00A0`   | Separator        | NON_JOINING  | _null_               | _0_        | &#x00A0; No-break space        |\n|`U+034F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x034F; Combining grapheme joiner |\n|`U+0640`   | Letter modifier  | JOIN_CAUSING | _null_               | _0_        | &#x0640; Arabic Tatweel        |\n|`U+200C`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOIN_CAUSING | _null_               | _0_        | &#x200D; Zero-width joiner     |\n|`U+200E`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200E; Left-to-Right marker  |\n|`U+200F`   | Other            | NON_JOINING  | _null_               | _0_        | &#x200F; Right-to-Left marker  |\n|`U+2010`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | NON_JOINING  | _null_               | _0_        | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | NON_JOINING  | _null_               | _0_        | &#x25CC; Dotted circle         |\n| | | | | | |\n:::\n\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n<!--- combining grapheme joiner explanation --->\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence \"_Letter_,ZWJ\"\nwould be used. To show the medial form of a letter in isolation, the\nsequence \"ZWJ,_Letter_,ZWJ\" would be used.\n\n\n<!--- Zero-Width Non Joiner explanation --->\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes.\n\n\n<!--- How shaping is affected by the <abbr title=\"Left-To-Right\">LTR</abbr> and <abbr title=\"Right-To-Left\">RTL</abbr> markers explanation --->\n\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and \"Hamza\") in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n"
  },
  {
    "path": "character-tables/character-tables-tamil.md",
    "content": "# Tamil character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Tamil text](../opentype-shaping-tamil.md).\n\n**Contents**\n\n  - [Tamil character table](#tamil-character-table)\n  - [Tamil Supplement character table](#tamil-supplement-character-table)\n  - [Grantha marks character table](#grantha-marks-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Tamil character table ##\n\nTamil glyphs should be classified as in the following\ntable. Codepoints in the Tamil block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Tamil character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0B80`   | _unassigned_     |                   |                            |                              |\n|`U+0B81`   | _unassigned_     |                   |                            |                              |\n|`U+0B82`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0B82; Anusvara            |\n|`U+0B83`   | Letter           | MODIFYING_LETTER  | _null_                     | &#x0B83; Visarga             |\n|`U+0B84`   | _unassigned_     |                   |                            |                              |\n|`U+0B85`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B85; A                   |\n|`U+0B86`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B86; Aa                  |\n|`U+0B87`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B87; I                   |\n|`U+0B88`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B88; Ii                  |\n|`U+0B89`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B89; U                   |\n|`U+0B8A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B8A; Uu                  |\n|`U+0B8B`   | _unassigned_     |                   |                            |                              |\n|`U+0B8C`   | _unassigned_     |                   |                            |                              |\n|`U+0B8D`   | _unassigned_     |                   |                            |                              |\n|`U+0B8E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B8E; E                   |\n|`U+0B8F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B8F; Ee                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0B90`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B90; Ai                  |\n|`U+0B91`   | _unassigned_     |                   |                            |                              |\n|`U+0B92`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B92; O                   |\n|`U+0B93`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B93; Oo                  |\n|`U+0B94`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0B94; Au                  |\n|`U+0B95`   | Letter           | CONSONANT         | _null_                     | &#x0B95; Ka                  |\n|`U+0B96`   | _unassigned_     |                   |                            |                              |\n|`U+0B97`   | _unassigned_     |                   |                            |                              |\n|`U+0B98`   | _unassigned_     |                   |                            |                              |\n|`U+0B99`   | Letter           | CONSONANT         | _null_                     | &#x0B99; Nga                 |\n|`U+0B9A`   | Letter           | CONSONANT         | _null_                     | &#x0B9A; Ca                  |\n|`U+0B9B`   | _unassigned_     |                   |                            |                              |\n|`U+0B9C`   | Letter           | CONSONANT         | _null_                     | &#x0B9C; Ja                  |\n|`U+0B9D`   | _unassigned_     |                   |                            |                              |\n|`U+0B9E`   | Letter           | CONSONANT         | _null_                     | &#x0B9E; Nya                 |\n|`U+0B9F`   | Letter           | CONSONANT         | _null_                     | &#x0B9F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BA0`   | _unassigned_     |                   |                            |                              |\n|`U+0BA1`   | _unassigned_     |                   |                            |                              |\n|`U+0BA2`   | _unassigned_     |                   |                            |                              |\n|`U+0BA3`   | Letter           | CONSONANT         | _null_                     | &#x0BA3; Nna                 |\n|`U+0BA4`   | Letter           | CONSONANT         | _null_                     | &#x0BA4; Ta                  |\n|`U+0BA5`   | _unassigned_     |                   |                            |                              |\n|`U+0BA6`   | _unassigned_     |                   |                            |                              |\n|`U+0BA7`   | _unassigned_     |                   |                            |                              |\n|`U+0BA8`   | Letter           | CONSONANT         | _null_                     | &#x0BA8; Na                  |\n|`U+0BA9`   | Letter           | CONSONANT         | _null_                     | &#x0BA9; Nnna                |\n|`U+0BAA`   | Letter           | CONSONANT         | _null_                     | &#x0BAA; Pa                  |\n|`U+0BAB`   | _unassigned_     |                   |                            |                              |\n|`U+0BAC`   | _unassigned_     |                   |                            |                              |\n|`U+0BAD`   | _unassigned_     |                   |                            |                              |\n|`U+0BAE`   | Letter           | CONSONANT         | _null_                     | &#x0BAE; Ma                  |\n|`U+0BAF`   | Letter           | CONSONANT         | _null_                     | &#x0BAF; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BB0`   | Letter           | CONSONANT         | _null_                     | &#x0BB0; Ra                  |\n|`U+0BB1`   | Letter           | CONSONANT         | _null_                     | &#x0BB1; Rra                 |\n|`U+0BB2`   | Letter           | CONSONANT         | _null_                     | &#x0BB2; La                  |\n|`U+0BB3`   | Letter           | CONSONANT         | _null_                     | &#x0BB3; Lla                 |\n|`U+0BB4`   | Letter           | CONSONANT         | _null_                     | &#x0BB4; Llla                |\n|`U+0BB5`   | Letter           | CONSONANT         | _null_                     | &#x0BB5; Va                  |\n|`U+0BB6`   | Letter           | CONSONANT         | _null_                     | &#x0BB6; Sha                 |\n|`U+0BB7`   | Letter           | CONSONANT         | _null_                     | &#x0BB7; Ssa                 |\n|`U+0BB8`   | Letter           | CONSONANT         | _null_                     | &#x0BB8; Sa                  |\n|`U+0BB9`   | Letter           | CONSONANT         | _null_                     | &#x0BB9; Ha                  |\n|`U+0BBA`   | _unassigned_     |                   |                            |                              |\n|`U+0BBB`   | _unassigned_     |                   |                            |                              |\n|`U+0BBC`   | _unassigned_     |                   |                            |                              |\n|`U+0BBD`   | _unassigned_     |                   |                            |                              |\n|`U+0BBE`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0BBE; Sign Aa             |\n|`U+0BBF`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0BBF; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BC0`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0BC0; Sign Ii             |\n|`U+0BC1`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0BC1; Sign U              |\n|`U+0BC2`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0BC2; Sign Uu             |\n|`U+0BC3`   | _unassigned_     |                   |                            |                              |\n|`U+0BC4`   | _unassigned_     |                   |                            |                              |\n|`U+0BC5`   | _unassigned_     |                   |                            |                              |\n|`U+0BC6`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0BC6; Sign E              |\n|`U+0BC7`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0BC7; Sign Ee             |\n|`U+0BC8`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0BC8; Sign Ai             |\n|`U+0BC9`   | _unassigned_     |                   |                            |                              |\n|`U+0BCA`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0BCA; Sign O              |\n|`U+0BCB`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0BCB; Sign Oo             |\n|`U+0BCC`   | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_AND_RIGHT_POSITION    | &#x0BCC; Sign Au             |\n|`U+0BCD`   | Mark [Mn]        | VIRAMA            | TOP_POSITION               | &#x0BCD; Virama              |\n|`U+0BCE`   | _unassigned_     |                   |                            |                              |\n|`U+0BCF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BD0`   | Letter           | _null_            | _null_                     | &#x0BD0; Om                  |\n|`U+0BD1`   | _unassigned_     |                   |                            |                              |\n|`U+0BD2`   | _unassigned_     |                   |                            |                              |\n|`U+0BD3`   | _unassigned_     |                   |                            |                              |\n|`U+0BD4`   | _unassigned_     |                   |                            |                              |\n|`U+0BD5`   | _unassigned_     |                   |                            |                              |\n|`U+0BD6`   | _unassigned_     |                   |                            |                              |\n|`U+0BD7`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0BD7; Au Length Mark      |\n|`U+0BD8`   | _unassigned_     |                   |                            |                              |\n|`U+0BD9`   | _unassigned_     |                   |                            |                              |\n|`U+0BDA`   | _unassigned_     |                   |                            |                              |\n|`U+0BDB`   | _unassigned_     |                   |                            |                              |\n|`U+0BDC`   | _unassigned_     |                   |                            |                              |\n|`U+0BDD`   | _unassigned_     |                   |                            |                              |\n|`U+0BDE`   | _unassigned_     |                   |                            |                              |\n|`U+0BDF`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BE0`   | _unassigned_     |                   |                            |                              |\n|`U+0BE1`   | _unassigned_     |                   |                            |                              |\n|`U+0BE2`   | _unassigned_     |                   |                            |                              |\n|`U+0BE3`   | _unassigned_     |                   |                            |                              |\n|`U+0BE4`   | _unassigned_     |                   |                            |                              |\n|`U+0BE5`   | _unassigned_     |                   |                            |                              |\n|`U+0BE6`   | Number           | NUMBER            | _null_                     | &#x0BE6; Digit Zero          |\n|`U+0BE7`   | Number           | NUMBER            | _null_                     | &#x0BE7; Digit One           |\n|`U+0BE8`   | Number           | NUMBER            | _null_                     | &#x0BE8; Digit Two           |\n|`U+0BE9`   | Number           | NUMBER            | _null_                     | &#x0BE9; Digit Three         |\n|`U+0BEA`   | Number           | NUMBER            | _null_                     | &#x0BEA; Digit Four          |\n|`U+0BEB`   | Number           | NUMBER            | _null_                     | &#x0BEB; Digit Five          |\n|`U+0BEC`   | Number           | NUMBER            | _null_                     | &#x0BEC; Digit Six           |\n|`U+0BED`   | Number           | NUMBER            | _null_                     | &#x0BED; Digit Seven         |\n|`U+0BEE`   | Number           | NUMBER            | _null_                     | &#x0BEE; Digit Eight         |\n|`U+0BEF`   | Number           | NUMBER            | _null_                     | &#x0BEF; Digit Nine          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0BF0`   | Number           | NUMBER            | _null_                     | &#x0BF0; Number Ten          |\n|`U+0BF1`   | Number           | NUMBER            | _null_                     | &#x0BF1; Number One Hundred  |\n|`U+0BF2`   | Number           | NUMBER            | _null_                     | &#x0BF2; Number One Thousand |\n|`U+0BF3`   | Symbol           | SYMBOL            | _null_                     | &#x0BF3; Day Sign            |\n|`U+0BF4`   | Symbol           | SYMBOL            | _null_                     | &#x0BF4; Month Sign          |\n|`U+0BF5`   | Symbol           | SYMBOL            | _null_                     | &#x0BF5; Year Sign           |\n|`U+0BF6`   | Symbol           | SYMBOL            | _null_                     | &#x0BF6; Debit Sign          |\n|`U+0BF7`   | Symbol           | SYMBOL            | _null_                     | &#x0BF7; Credit Sign         |\n|`U+0BF8`   | Symbol           | SYMBOL            | _null_                     | &#x0BF8; As Above Sign       |\n|`U+0BF9`   | Symbol           | SYMBOL            | _null_                     | &#x0BF9; Tamil Rupee Sign    |\n|`U+0BFA`   | Symbol           | SYMBOL            | _null_                     | &#x0BFA; Number Sign         |\n|`U+0BFB`   | _unassigned_     |                   |                            |                              |\n|`U+0BFC`   | _unassigned_     |                   |                            |                              |\n|`U+0BFD`   | _unassigned_     |                   |                            |                              |\n|`U+0BFE`   | _unassigned_     |                   |                            |                              |\n|`U+0BFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Tamil Supplement character table ##\n\nTamil text runs may also include historical symbols and fractions from\nthe Tamil Supplement block. These characters should be classified as\nfollows.\n\n\n:::{table} Tamil Supplement character table\n\n| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph                         |\n|:----------|:-----------------|:--------------|:------------------------|:------------------------------|\n| `U+11FC0` | Number           | NUMBER        | _null_                  | &#x11FC0; Fraction One Three-Hundred-And-Twentieth |\n| `U+11FC1` | Number           | NUMBER        | _null_                  | &#x11FC1; Fraction One One-Hundred-And-Sixtieth |\n| `U+11FC2` | Number           | NUMBER        | _null_                  | &#x11FC2; Fraction One Eightieth |\n| `U+11FC3` | Number           | NUMBER        | _null_                  | &#x11FC3; Fraction One Sixty-Fourth |\n| `U+11FC4` | Number           | NUMBER        | _null_                  | &#x11FC4; Fraction One Fortieth |\n| `U+11FC5` | Number           | NUMBER        | _null_                  | &#x11FC5; Fraction One Thirty-Second |\n| `U+11FC6` | Number           | NUMBER        | _null_                  | &#x11FC6; Fraction Three Eightieths |\n| `U+11FC7` | Number           | NUMBER        | _null_                  | &#x11FC7; Fraction Three Sixty-Fourths |\n| `U+11FC8` | Number           | NUMBER        | _null_                  | &#x11FC8; Fraction One Twentieth |\n| `U+11FC9` | Number           | NUMBER        | _null_                  | &#x11FC9; Fraction One Sixteenth-1 |\n| `U+11FCA` | Number           | NUMBER        | _null_                  | &#x11FCA; Fraction One Sixteenth-2 |\n| `U+11FCB` | Number           | NUMBER        | _null_                  | &#x11FCB; Fraction One Tenth  |\n| `U+11FCC` | Number           | NUMBER        | _null_                  | &#x11FCC; Fraction One Eighth |\n| `U+11FCD` | Number           | NUMBER        | _null_                  | &#x11FCD; Fraction Three Twentieths |\n| `U+11FCE` | Number           | NUMBER        | _null_                  | &#x11FCE; Fraction Three Sixteenths |\n| `U+11FCF` | Number           | NUMBER        | _null_                  | &#x11FCF; Fraction One Fifth  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t           \n| `U+11FD0` | Number           | NUMBER        | _null_                  | &#x11FD0; Fraction One Quarter |\n| `U+11FD1` | Number           | NUMBER        | _null_                  | &#x11FD1; Fraction One Half-1 |\n| `U+11FD2` | Number           | NUMBER        | _null_                  | &#x11FD2; Fraction One Half-2 |\n| `U+11FD3` | Number           | NUMBER        | _null_                  | &#x11FD3; Fraction Three Quarters |\n| `U+11FD4` | Number           | NUMBER        | _null_                  | &#x11FD4; Fraction Downscaling Factor Kiizh |\n| `U+11FD5` | Symbol           | SYMBOL        | _null_                  | &#x11FD5; Sign Nel            |\n| `U+11FD6` | Symbol           | SYMBOL        | _null_                  | &#x11FD6; Sign Cevitu         |\n| `U+11FD7` | Symbol           | SYMBOL        | _null_                  | &#x11FD7; Sign Aazhaakku      |\n| `U+11FD8` | Symbol           | SYMBOL        | _null_                  | &#x11FD8; Sign Uzhakku        |\n| `U+11FD9` | Symbol           | SYMBOL        | _null_                  | &#x11FD9; Sign Muuvuzhakku    |\n| `U+11FDA` | Symbol           | SYMBOL        | _null_                  | &#x11FDA; Sign Kuruni         |\n| `U+11FDB` | Symbol           | SYMBOL        | _null_                  | &#x11FDB; Sign Pathakku       |\n| `U+11FDC` | Symbol           | SYMBOL        | _null_                  | &#x11FDC; Sign Mukkuruni      |\n| `U+11FDD` | Symbol           | SYMBOL        | _null_                  | &#x11FDD; Sign Kaacu          |\n| `U+11FDE` | Symbol           | SYMBOL        | _null_                  | &#x11FDE; Sign Panam          |\n| `U+11FDF` | Symbol           | SYMBOL        | _null_                  | &#x11FDF; Sign Pon            |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t      \n| `U+11FE0` | Symbol           | SYMBOL        | _null_                  | &#x11FE0; Sign Varaakan       |\n| `U+11FE1` | Symbol           | SYMBOL        | _null_                  | &#x11FE1; Sign Paaram         |\n| `U+11FE2` | Symbol           | SYMBOL        | _null_                  | &#x11FE2; Sign Kuzhi          |\n| `U+11FE3` | Symbol           | SYMBOL        | _null_                  | &#x11FE3; Sign Veli           |\n| `U+11FE4` | Symbol           | SYMBOL        | _null_                  | &#x11FE4; Wet Cultivation Sign |\n| `U+11FE5` | Symbol           | SYMBOL        | _null_                  | &#x11FE5; Dry Cultivation Sign |\n| `U+11FE6` | Symbol           | SYMBOL        | _null_                  | &#x11FE6; Land Sign           |\n| `U+11FE7` | Symbol           | SYMBOL        | _null_                  | &#x11FE7; Salt Pan Sign       |\n| `U+11FE8` | Symbol           | SYMBOL        | _null_                  | &#x11FE8; Traditional Credit Sign |\n| `U+11FE9` | Symbol           | SYMBOL        | _null_                  | &#x11FE9; Traditional Number Sign |\n| `U+11FEA` | Symbol           | SYMBOL        | _null_                  | &#x11FEA; Current Sign        |\n| `U+11FEB` | Symbol           | SYMBOL        | _null_                  | &#x11FEB; And Odd Sign        |\n| `U+11FEC` | Symbol           | SYMBOL        | _null_                  | &#x11FEC; Spent Sign          |\n| `U+11FED` | Symbol           | SYMBOL        | _null_                  | &#x11FED; Total Sign          |\n| `U+11FEE` | Symbol           | SYMBOL        | _null_                  | &#x11FEE; In Possession Sign  |\n| `U+11FEF` | Symbol           | SYMBOL        | _null_                  | &#x11FEF; Starting From Sign  |\n| | | | |\n| `U+11FF0` | Symbol           | SYMBOL        | _null_                  | &#x11FF0; Sign Muthaliya      |\n| `U+11FF1` | Symbol           | SYMBOL        | _null_                  | &#x11FF1; Sign Vakaiyaraa     |\n| `U+11FF2` | _unassigned_     |               |                         |                               |\n| `U+11FF3` | _unassigned_     |               |                         |                               |\n| `U+11FF4` | _unassigned_     |               |                         |                               |\n| `U+11FF5` | _unassigned_     |               |                         |                               |\n| `U+11FF6` | _unassigned_     |               |                         |                               |\n| `U+11FF7` | _unassigned_     |               |                         |                               |\n| `U+11FF8` | _unassigned_     |               |                         |                               |\n| `U+11FF9` | _unassigned_     |               |                         |                               |\n| `U+11FFA` | _unassigned_     |               |                         |                               |\n| `U+11FFB` | _unassigned_     |               |                         |                               |\n| `U+11FFC` | _unassigned_     |               |                         |                               |\n| `U+11FFD` | _unassigned_     |               |                         |                               |\n| `U+11FFE` | _unassigned_     |               |                         |                               |\n| `U+11FFF` | Punctuation      | _null_        | _null_                  | &#x11FFF; End Of Text         |\n:::\n\n\n## Grantha marks character table ##\n\nTamil text runs may also include diacritical and syllable-modifier\nmarks from the Grantha block. These characters should be classified as\nfollows.\n\n\n:::{table} Grantha marks character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+11301`  | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x11301; Grantha Candrabindu|\n|`U+11303`  | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x11303; Grantha Visarga    |\n|`U+1133B`  | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x1133b; Combining Bindu Below |\n|`U+1133C`  | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x1133c; Grantha Nukta      |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Tamil script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF3`   | Mark [Mc]        | VISARGA           | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Tamil text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Tamil text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Tamil text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+00B2`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x00B2; Superscript Two       |\n|`U+00B3`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x00B3; Superscript Three     |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+2074`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x2074; Superscript Four      |\n|`U+2082`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x2082; Subscript Two       |\n|`U+2083`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x2083; Subscript Three     |\n|`U+2084`   | Number           | SYLLABLE_MODIFIER | TOP                        | &#x2084; Subscript Four      |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\nTamil text sometimes uses the Latin numerals 2, 3, and 4 in\nsuperscript or subscript positions to annotate Sanskrit. When used in\nthis fashion, the superscripts and subscripts are treated as\n`SYLLABLE_MODIFIER` signs for shaping purposes.\n"
  },
  {
    "path": "character-tables/character-tables-telugu.md",
    "content": "# Telugu character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Telugu text](../opentype-shaping-telugu.md).\n\n**Contents**\n\n  - [Telugu character table](#telugu-character-table)\n  - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\t  \n\n## Telugu character table ##\n\nTelugu glyphs should be classified as in the following\ntable. Codepoints in the Telugu block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Telugu character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0C00`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0C00; Combining Candrabindu Above |\n|`U+0C01`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0C01; Candrabindu         |\n|`U+0C02`   | Mark [Mc]        | BINDU             | RIGHT_POSITION             | &#x0C02; Anusvara            |\n|`U+0C03`   | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0C03; Visarga             |\n|`U+0C04`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0C04; Combining Anusvara Above |\n|`U+0C05`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C05; A                   |\n|`U+0C06`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C06; Aa                  |\n|`U+0C07`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C07; I                   |\n|`U+0C08`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C08; Ii                  |\n|`U+0C09`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C09; U                   |\n|`U+0C0A`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C0A; Uu                  |\n|`U+0C0B`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C0B; Vocalic R           |\n|`U+0C0C`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C0C; Vocalic L           |\n|`U+0C0D`   | _unassigned_     |                   |                            |                              |\n|`U+0C0E`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C0E; E                   |\n|`U+0C0F`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C0F; Ee                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C10`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C10; Ai                  |\n|`U+0C11`   | _unassigned_     |                   |                            |                              |\n|`U+0C12`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C12; O                   |\n|`U+0C13`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C13; Oo                  |\n|`U+0C14`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C14; Au                  |\n|`U+0C15`   | Letter           | CONSONANT         | _null_                     | &#x0C15; Ka                  |\n|`U+0C16`   | Letter           | CONSONANT         | _null_                     | &#x0C16; Kha                 |\n|`U+0C17`   | Letter           | CONSONANT         | _null_                     | &#x0C17; Ga                  |\n|`U+0C18`   | Letter           | CONSONANT         | _null_                     | &#x0C18; Gha                 |\n|`U+0C19`   | Letter           | CONSONANT         | _null_                     | &#x0C19; Nga                 |\n|`U+0C1A`   | Letter           | CONSONANT         | _null_                     | &#x0C1A; Ca                  |\n|`U+0C1B`   | Letter           | CONSONANT         | _null_                     | &#x0C1B; Cha                 |\n|`U+0C1C`   | Letter           | CONSONANT         | _null_                     | &#x0C1C; Ja                  |\n|`U+0C1D`   | Letter           | CONSONANT         | _null_                     | &#x0C1D; Jha                 |\n|`U+0C1E`   | Letter           | CONSONANT         | _null_                     | &#x0C1E; Nya                 |\n|`U+0C1F`   | Letter           | CONSONANT         | _null_                     | &#x0C1F; Tta                 |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C20`   | Letter           | CONSONANT         | _null_                     | &#x0C20; Ttha                |\n|`U+0C21`   | Letter           | CONSONANT         | _null_                     | &#x0C21; Dda                 |\n|`U+0C22`   | Letter           | CONSONANT         | _null_                     | &#x0C22; Ddha                |\n|`U+0C23`   | Letter           | CONSONANT         | _null_                     | &#x0C23; Nna                 |\n|`U+0C24`   | Letter           | CONSONANT         | _null_                     | &#x0C24; Ta                  |\n|`U+0C25`   | Letter           | CONSONANT         | _null_                     | &#x0C25; Tha                 |\n|`U+0C26`   | Letter           | CONSONANT         | _null_                     | &#x0C26; Da                  |\n|`U+0C27`   | Letter           | CONSONANT         | _null_                     | &#x0C27; Dha                 |\n|`U+0C28`   | Letter           | CONSONANT         | _null_                     | &#x0C28; Na                  |\n|`U+0C29`   | _unassigned_     |                   |                            |                              |\n|`U+0C2A`   | Letter           | CONSONANT         | _null_                     | &#x0C2A; Pa                  |\n|`U+0C2B`   | Letter           | CONSONANT         | _null_                     | &#x0C2B; Pha                 |\n|`U+0C2C`   | Letter           | CONSONANT         | _null_                     | &#x0C2C; Ba                  |\n|`U+0C2D`   | Letter           | CONSONANT         | _null_                     | &#x0C2D; Bha                 |\n|`U+0C2E`   | Letter           | CONSONANT         | _null_                     | &#x0C2E; Ma                  |\n|`U+0C2F`   | Letter           | CONSONANT         | _null_                     | &#x0C2F; Ya                  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C30`   | Letter           | CONSONANT         | _null_                     | &#x0C30; Ra                  |\n|`U+0C31`   | Letter           | CONSONANT         | _null_                     | &#x0C31; Rra                 |\n|`U+0C32`   | Letter           | CONSONANT         | _null_                     | &#x0C32; La                  |\n|`U+0C33`   | Letter           | CONSONANT         | _null_                     | &#x0C33; Lla                 |\n|`U+0C34`   | Letter           | CONSONANT         | _null_                     | &#x0C34; Llla                |\n|`U+0C35`   | Letter           | CONSONANT         | _null_                     | &#x0C35; Va                  |\n|`U+0C36`   | Letter           | CONSONANT         | _null_                     | &#x0C36; Sha                 |\n|`U+0C37`   | Letter           | CONSONANT         | _null_                     | &#x0C37; Ssa                 |\n|`U+0C38`   | Letter           | CONSONANT         | _null_                     | &#x0C38; Sa                  |\n|`U+0C39`   | Letter           | CONSONANT         | _null_                     | &#x0C39; Ha                  |\n|`U+0C3A`   | _unassigned_     |                   |                            |                              |\n|`U+0C3B`   | _unassigned_     |                   |                            |                              |\n|`U+0C3C`   | Mark [Mn]        | NUKTA             | BOTTOM_POSITION            | &#x0C3C; Nukta               |\n|`U+0C3D`   | Letter           | AVAGRAHA          | _null_                     | &#x0C3D; Avagraha            |\n|`U+0C3E`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C3E; Sign Aa             |\n|`U+0C3F`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C3F; Sign I              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C40`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C40; Sign Ii             |\n|`U+0C41`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0C41; Sign U              |\n|`U+0C42`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0C42; Sign Uu             |\n|`U+0C43`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0C43; Sign Vocalic R      |\n|`U+0C44`   | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0C44; Sign Vocalic Rr     |\n|`U+0C45`   | _unassigned_     |                   |                            |                              |\n|`U+0C46`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C46; Sign E              |\n|`U+0C47`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C47; Sign Ee             |\n|`U+0C48`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0C48; Sign Ai             |\n|`U+0C49`   | _unassigned_     |                   |                            |                              |\n|`U+0C4A`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C4A; Sign O              |\n|`U+0C4B`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C4B; Sign Oo             |\n|`U+0C4C`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C4C; Sign Au             |\n|`U+0C4D`   | Mark [Mn]        | VIRAMA            | TOP_POSITION               | &#x0C4D; Virama              |\n|`U+0C4E`   | _unassigned_     |                   |                            |                              |\n|`U+0C4F`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C50`   | _unassigned_     |                   |                            |                              |\n|`U+0C51`   | _unassigned_     |                   |                            |                              |\n|`U+0C52`   | _unassigned_     |                   |                            |                              |\n|`U+0C53`   | _unassigned_     |                   |                            |                              |\n|`U+0C54`   | _unassigned_     |                   |                            |                              |\n|`U+0C55`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0C55; Length Mark         |\n|`U+0C56`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0C56; Ai Length Mark      |\n|`U+0C57`   | _unassigned_     |                   |                            |                              |\n|`U+0C58`   | Letter           | CONSONANT         | _null_                     | &#x0C58; Tsa                 |\n|`U+0C59`   | Letter           | CONSONANT         | _null_                     | &#x0C59; Dza                 |\n|`U+0C5A`   | Letter           | CONSONANT         | _null_                     | &#x0C5A; Rrra                |\n|`U+0C5B`   | _unassigned_     |                   |                            |                              |\n|`U+0C5C`   | _unassigned_     |                   |                            |                              |\n|`U+0C5D`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x0C5D; Nakaara Pollu       |\n|`U+0C5E`   | _unassigned_     |                   |                            |                              |\n|`U+0C5F`   | _unassigned_     |                   |                            |                              |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C60`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C60; Vocalic Rr          |\n|`U+0C61`   | Letter           | VOWEL_INDEPENDENT | _null_                     | &#x0C61; Vocalic Ll          |\n|`U+0C62`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0C62; Sign Vocalic L      |\n|`U+0C63`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0C63; Sign Vocalic Ll     |\n|`U+0C64`   | _unassigned_     |                   |                            |                              |\n|`U+0C65`   | _unassigned_     |                   |                            |                              |\n|`U+0C66`   | Number           | NUMBER            | _null_                     | &#x0C66; Digit Zero          |\n|`U+0C67`   | Number           | NUMBER            | _null_                     | &#x0C67; Digit One           |\n|`U+0C68`   | Number           | NUMBER            | _null_                     | &#x0C68; Digit Two           |\n|`U+0C69`   | Number           | NUMBER            | _null_                     | &#x0C69; Digit Three         |\n|`U+0C6A`   | Number           | NUMBER            | _null_                     | &#x0C6A; Digit Four          |\n|`U+0C6B`   | Number           | NUMBER            | _null_                     | &#x0C6B; Digit Five          |\n|`U+0C6C`   | Number           | NUMBER            | _null_                     | &#x0C6C; Digit Six           |\n|`U+0C6D`   | Number           | NUMBER            | _null_                     | &#x0C6D; Digit Seven         |\n|`U+0C6E`   | Number           | NUMBER            | _null_                     | &#x0C6E; Digit Eight         |\n|`U+0C6F`   | Number           | NUMBER            | _null_                     | &#x0C6F; Digit Nine          |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+0C70`   | _unassigned_     |                   |                            |                              |\n|`U+0C71`   | _unassigned_     |                   |                            |                              |\n|`U+0C72`   | _unassigned_     |                   |                            |                              |\n|`U+0C73`   | _unassigned_     |                   |                            |                              |\n|`U+0C74`   | _unassigned_     |                   |                            |                              |\n|`U+0C75`   | _unassigned_     |                   |                            |                              |\n|`U+0C76`   | _unassigned_     |                   |                            |                              |\n|`U+0C77`   | Punctuation      | _null_            | _null_                     | &#x0C77; Sign Siddham        |\n|`U+0C78`   | Number           | NUMBER            | _null_                     | &#x0C78; Fraction Zero Odd P |\n|`U+0C79`   | Number           | NUMBER            | _null_                     | &#x0C79; Fraction One Odd P  |\n|`U+0C7A`   | Number           | NUMBER            | _null_                     | &#x0C7A; Fraction Two Odd P  |\n|`U+0C7B`   | Number           | NUMBER            | _null_                     | &#x0C7B; Fraction Three Odd P|\n|`U+0C7C`   | Number           | NUMBER            | _null_                     | &#x0C7C; Fraction One Even P |\n|`U+0C7D`   | Number           | NUMBER            | _null_                     | &#x0C7D; Fraction Two Even P |\n|`U+0C7E`   | Number           | NUMBER            | _null_                     | &#x0C7E; Fraction Three Even P|\n|`U+0C7F`   | Symbol           | SYMBOL            | _null_                     | &#x0C7F; Tuumu               |\n:::\n\n\n## Vedic Extensions character table ##\n\nSanskrit runs written in the Telugu script may also include\ncharacters from the Vedic Extensions block. These characters should be\nclassified as follows.\n\n> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md) \n> document for additional information.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | SYMBOL            | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | SYMBOL            | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | SYMBOL            | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF1`   | Letter           | SYMBOL            | _null_                     | &#x1CF1; Sign Anusvara Ubhayato Mukha |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Telugu text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block. Telugu text can also incorporate the udatta\n(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+0951`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x0951; Udatta              |\n|`U+0952`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x0952; Anudatta            |\n|`U+0964`   | Punctuation      | _null_            | _null_                     | &#x0964; Danda               |\n|`U+0965`   | Punctuation      | _null_            | _null_                     | &#x0965; Double Danda        |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Telugu text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a \"_Consonant_,Halant,_Consonant_\" sequence. The\nsequence \"_Consonant_,Halant,ZWJ,_Consonant_\" blocks the formation of\na conjunct between the two consonants. \n\nNote, however, that the \"_Consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The\nsequence \"_Consonant_,Halant,ZWNJ,_Consonant_\" should produce the\nfirst consonant in its standard form, followed by an explicit\n\"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_Consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n\n"
  },
  {
    "path": "character-tables/character-tables-thai.md",
    "content": "# Thai character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Thai text](../opentype-shaping-thai-lao.md#the-thailao-shaping-model).\n\n**Contents**\n\n  - [Thai character table](#thai-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Thai character table ##\n\nThai glyphs should be classified as in the following\ntable. Codepoints in the Thai block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Thai character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass | Combining class | PUA    | Glyph                         |\n|:----------|:-----------------|:------------------|:------------------------|:----------------|:-------|:------------------------------|\n|`U+0E00`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E01`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E01; Ko Kai               |\n|`U+0E02`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E02; Kho Khai             |\n|`U+0E03`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E03; Kho Khuat            |\n|`U+0E04`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E04; Kho Khwai            |\n|`U+0E05`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E05; Kho Khon             |\n|`U+0E06`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E06; Kho Rakhang          |\n|`U+0E07`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E07; Ngo Ngu              |\n|`U+0E08`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E08; Cho Chan             |\n|`U+0E09`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E09; Cho Ching            |\n|`U+0E0A`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E0A; Cho Chang            |\n|`U+0E0B`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E0B; So So                |\n|`U+0E0C`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E0C; Cho Choe             |\n|`U+0E0D`   | Letter           | CONSONANT         | _null_                  | _0_             | RC     | &#x0E0D; Yo Ying              |\n|`U+0E0E`   | Letter           | CONSONANT         | _null_                  | _0_             | DC     | &#x0E0E; Do Chada             |\n|`U+0E0F`   | Letter           | CONSONANT         | _null_                  | _0_             | DC     | &#x0E0F; To Patak             |\n| | | | | | | |   \n|`U+0E10`   | Letter           | CONSONANT         | _null_                  | _0_             | RC     | &#x0E10; Tho Than             |\n|`U+0E11`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E11; Tho Nangmontho       |\n|`U+0E12`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E12; Tho Phuthao          |\n|`U+0E13`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E13; No Nen               |\n|`U+0E14`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E14; Do Dek               |\n|`U+0E15`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E15; To Tao               |\n|`U+0E16`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E16; Tho Thung            |\n|`U+0E17`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E17; Tho Thahan           |\n|`U+0E18`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E18; Tho Thong            |\n|`U+0E19`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E19; No Nu                |\n|`U+0E1A`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E1A; Bo Baimai            |\n|`U+0E1B`   | Letter           | CONSONANT         | _null_                  | _0_             | AC     | &#x0E1B; Po Pla               |\n|`U+0E1C`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E1C; Pho Phung            |\n|`U+0E1D`   | Letter           | CONSONANT         | _null_                  | _0_             | AC     | &#x0E1D; Fo Fa                |\n|`U+0E1E`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E1E; Pho Phan             |\n|`U+0E1F`   | Letter           | CONSONANT         | _null_                  | _0_             | AC     | &#X0e1f; Fo Fan               |\n| | | | | | | |   \n|`U+0E20`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#X0e20; Pho Samphao          |\n|`U+0E21`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E21; Mo Ma                |\n|`U+0E22`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E22; Yo Yak               |\n|`U+0E23`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E23; Ro Rua               |\n|`U+0E24`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E24; Ru                   |\n|`U+0E25`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E25; Lo Ling              |\n|`U+0E26`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E26; Lu                   |\n|`U+0E27`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E27; Wo Waen              |\n|`U+0E28`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E28; So Sala              |\n|`U+0E29`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E29; So Rusi              |\n|`U+0E2A`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E2A; So Sua               |\n|`U+0E2B`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E2B; Ho Hip               |\n|`U+0E2C`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E2C; Lo Chula             |\n|`U+0E2D`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E2D; O Ang                |\n|`U+0E2E`   | Letter           | CONSONANT         | _null_                  | _0_             | NC     | &#x0E2E; Ho Nokhuk            |\n|`U+0E2F`   | Letter           | CONSONANT         | _null_                  | _0_             | _null_ | &#x0E2F; Paiyannoi            |\n| | | | | | | |\n|`U+0E30`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | CV     | &#x0E30; Sara A               |\n|`U+0E31`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E31; Mai Han-akat         |\n|`U+0E32`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | CV     | &#x0E32; Sara Aa              |\n|`U+0E33`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | _null_ | &#x0E33; Sara Am              |\n|`U+0E34`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E34; Sara I               |\n|`U+0E35`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E35; Sara Ii              |\n|`U+0E36`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E36; Sara Ue              |\n|`U+0E37`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E37; Sara Uee             |\n|`U+0E38`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION         | 3               | BV     | &#x0E38; Sara U               |\n|`U+0E39`   | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION         | 3               | BV     | &#x0E39; Sara Uu              |\n|`U+0E3A`   | Mark [Mn]        | PURE_KILLER       | BOTTOM_POSITION         | 9               | BV     | &#x0E3A; Phinthu              |\n|`U+0E3B`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E3C`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E3D`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E3E`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E3F`   | Symbol           | SYMBOL            | _null_                  | _0_             | _null_ | &#x0E3F; Currency symbol Baht |\n| | | | | | | |\n|`U+0E40`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | CV     | &#x0E40; Sara E               |\n|`U+0E41`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | CV     | &#x0E41; Sara Ae              |\n|`U+0E42`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | CV     | &#x0E42; Sara O               |\n|`U+0E43`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | CV     | &#x0E43; Sara Ai Maimuan      |\n|`U+0E44`   | Letter           | VOWEL_DEPENDENT   | VISUAL_ORDER_LEFT       | _0_             | CV     | &#x0E44; Sara Ai Maimalai     |\n|`U+0E45`   | Letter           | VOWEL_DEPENDENT   | RIGHT_POSITION          | _0_             | CV     | &#x0E45; Lakkhangyao          |\n|`U+0E46`   | Letter Modifier  | _null_            | _null_                  | _0_             | _null_ | &#x0E46; Maiyamok             |\n|`U+0E47`   | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION            | _0_             | AV     | &#x0E47; Maitaikhu            |\n|`U+0E48`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 107             | TV     | &#x0E48; Mai Ek               |\n|`U+0E49`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 107             | TV     | &#x0E49; Mai Tho              |\n|`U+0E4A`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 107             | TV     | &#x0E4A; Mai Tri              |\n|`U+0E4B`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | 107             | TV     | &#x0E4B; Mai Chattawa         |\n|`U+0E4C`   | Mark [Mn]        | CONSONANT_KILLER  | TOP_POSITION            | _0_             | TV     | &#x0E4C; Thanthakhat          |\n|`U+0E4D`   | Mark [Mn]        | BINDU             | TOP_POSITION            | _0_             | AV     | &#x0E4D; Nikhahit             |\n|`U+0E4E`   | Mark [Mn]        | PURE_KILLER       | TOP_POSITION            | _0_             | AV     | &#x0E4E; Yamakkan             |\n|`U+0E4F`   | Punctuation      | _null_            | _null_                  | _0_             | _null_ | &#x0E4F; Fongman              |\n| | | | | | | |\n|`U+0E50`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E50; Digit zero           |\n|`U+0E51`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E51; Digit one            |\n|`U+0E52`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E52; Digit two            |\n|`U+0E53`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E53; Digit three          |\n|`U+0E54`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E54; Digit four           |\n|`U+0E55`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E55; Digit five           |\n|`U+0E56`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E56; Digit six            |\n|`U+0E57`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E57; Digit seven          |\n|`U+0E58`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E58; Digit eight          |\n|`U+0E59`   | Number           | NUMBER            | _null_                  | _0_             | _null_ | &#x0E59; Digit nine           |\n|`U+0E5A`   | Punctuation      | _null_            | _null_                  | _0_             | _null_ | &#x0E5A; Angkhankhu           |\n|`U+0E5B`   | Punctuation      | _null_            | _null_                  | _0_             | _null_ | &#x0E5B; Khomut               |\n|`U+0E5C`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E5D`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E5E`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E5F`   | _unassigned_     |                   |                         |                 |        |                               |\n| | | | | | | |\n|`U+0E60`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E61`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E62`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E63`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E64`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E65`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E66`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E67`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E68`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E69`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6A`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6B`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6C`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6D`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6E`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E6F`   | _unassigned_     |                   |                         |                 |        |                               |\n| | | | | | | |\n|`U+0E70`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E71`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E72`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E73`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E74`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E75`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E76`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E77`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E78`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E79`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7A`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7B`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7C`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7D`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7E`   | _unassigned_     |                   |                         |                 |        |                               |\n|`U+0E7F`   | _unassigned_     |                   |                         |                 |        |                               |\n:::\n\n\n\n## Miscellaneous character table ##\n\nIn addition to general punctuation, runs of Thai text often use the\ncombining macron below (`U+0331 `), combining tilde (`U+0303`), modifier letter\napostrophe (`U+02BC`), and modifier letter minus sign (`U+02D7`), from the\nCombining Diacritical Marks block, particularly when used to write minority\nlanguages.\n\nIn addition, Thai text typically does not insert spaces between words.\nConsequently, the Zero-Width Space (`U+200B`) character is often used to insert\ninvisible break points that may be converted to line breaks.\n\n\n:::{table} Additional punctuation character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+02BC`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#x02BC; Modifier apostrophe   |\n|`U+02D7`   | Mark [Mn]        | TONE_MARKER       | BOTTOM_POSITION            | &#x02D7; Modifier minus sign   |\n|`U+0303`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#x0303; Combining tilde       |\n|`U+0331`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#x0331; Combining macron below|\n|`U+200B`   | Separator        | PLACEHOLDER       | _null_                     | &#x200B; Zero-width space      |\n:::\n\n\nOther important characters that may be encountered when shaping runs\nof Thai text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+2010`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2010; Hyphen                |\n|`U+2011`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2011; No-break hyphen       |\n|`U+2012`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2012; Figure dash           |\n|`U+2013`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2013; En dash               |\n|`U+2014`   | Punctuation      | PLACEHOLDER       | _null_                     | &#x2014; Em dash               |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n:::\n\n"
  },
  {
    "path": "character-tables/character-tables-tibetan.md",
    "content": "# Tibetan character tables #\n\nThis document lists the per-character shaping information needed to\n[shape Tibetan text](../opentype-shaping-tibetan.md).\n\n**Contents**\n\n  - [Tibetan character table](#tibetan-character-table)\n  - [Miscellaneous character table](#miscellaneous-character-table)\n\n\n## Tibetan character table ##\n\nTibetan glyphs should be classified as in the following\ntable. Codepoints in the Tibetan block with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. Note that\nthis does include some valid codepoints, such as currency marks,\npunctuation, and other symbols.\n\n> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important\n> during syllable identification, but generally evoke no further\n> special behavior during the rest of the shaping process.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n:::{table} Tibetan character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                                            |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------------------------|\n| `U+0F00`  | Letter           | _null_            | _null_                     | &#x0F00; Syllable Om                             |\n| `U+0F01`  | Symbol           | SYMBOL            | _null_                     | &#x0F01; Gter Yig Mgo Truncated A                |\n| `U+0F02`  | Symbol           | SYMBOL            | _null_                     | &#x0F02; Gter Yig Mgo -Um Rnam Bcad Ma           |\n| `U+0F03`  | Symbol           | SYMBOL            | _null_                     | &#x0F03; Gter Yig Mgo -Um Gter Tsheg Ma          |\n| `U+0F04`  | Punctuation      | _null_            | _null_                     | &#x0F04; Initial Yig Mgo Mdun Ma                 |\n| `U+0F05`  | Punctuation      | _null_            | _null_                     | &#x0F05; Closing Yig Mgo Sgab Ma                 |\n| `U+0F06`  | Punctuation      | _null_            | _null_                     | &#x0F06; Caret Yig Mgo Phur Shad Ma              |\n| `U+0F07`  | Punctuation      | _null_            | _null_                     | &#x0F07; Yig Mgo Tsheg Shad Ma                   |\n| `U+0F08`  | Punctuation      | _null_            | _null_                     | &#x0F08; Sbrul Shad                              |\n| `U+0F09`  | Punctuation      | _null_            | _null_                     | &#x0F09; Bskur Yig Mgo                           |\n| `U+0F0A`  | Punctuation      | _null_            | _null_                     | &#x0F0A; Bka- Shog Yig Mgo                       |\n| `U+0F0B`  | Punctuation      | _null_            | _null_                     | &#x0F0B; Intersyllabic Tsheg                     |\n| `U+0F0C`  | Punctuation      | _null_            | _null_                     | &#x0F0C; Delimiter Tsheg Bstar                   |\n| `U+0F0D`  | Punctuation      | _null_            | _null_                     | &#x0F0D; Shad                                    |\n| `U+0F0E`  | Punctuation      | _null_            | _null_                     | &#x0F0E; Nyis Shad                               |\n| `U+0F0F`  | Punctuation      | _null_            | _null_                     | &#x0F0F; Tsheg Shad                              |\n| | | | | |\n| `U+0F10`  | Punctuation      | _null_            | _null_                     | &#x0F10; Nyis Tsheg Shad                         |\n| `U+0F11`  | Punctuation      | _null_            | _null_                     | &#x0F11; Rin Chen Spungs Shad                    |\n| `U+0F12`  | Punctuation      | _null_            | _null_                     | &#x0F12; Rgya Gram Shad                          |\n| `U+0F13`  | Symbol           | SYMBOL            | _null_                     | &#x0F13; Caret -Dzud Rtags Me Long Can           |\n| `U+0F14`  | Punctuation      | _null_            | _null_                     | &#x0F14; Gter Tsheg                              |\n| `U+0F15`  | Symbol           | SYMBOL            | _null_                     | &#x0F15; Logotype Sign Chad Rtags                |\n| `U+0F16`  | Symbol           | SYMBOL            | _null_                     | &#x0F16; Logotype Sign Lhag Rtags                |\n| `U+0F17`  | Symbol           | SYMBOL            | _null_                     | &#x0F17; Astrological Sign Sgra Gcan -Char Rtags |\n| `U+0F18`  | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0F18; Astrological Sign -Khyud Pa             |\n| `U+0F19`  | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0F19; Astrological Sign Sdong Tshugs          |\n| `U+0F1A`  | Symbol           | SYMBOL            | _null_                     | &#x0F1A; Sign Rdel Dkar Gcig                     |\n| `U+0F1B`  | Symbol           | SYMBOL            | _null_                     | &#x0F1B; Sign Rdel Dkar Gnyis                    |\n| `U+0F1C`  | Symbol           | SYMBOL            | _null_                     | &#x0F1C; Sign Rdel Dkar Gsum                     |\n| `U+0F1D`  | Symbol           | SYMBOL            | _null_                     | &#x0F1D; Sign Rdel Nag Gcig                      |\n| `U+0F1E`  | Symbol           | SYMBOL            | _null_                     | &#x0F1E; Sign Rdel Nag Gnyis                     |\n| `U+0F1F`  | Symbol           | SYMBOL            | _null_                     | &#x0F1F; Sign Rdel Dkar Rdel Nag                 |\n| | | | | |\n| `U+0F20`  | Number           | NUMBER            | _null_                     | &#x0F20; Digit Zero                              |\n| `U+0F21`  | Number           | NUMBER            | _null_                     | &#x0F21; Digit One                               |\n| `U+0F22`  | Number           | NUMBER            | _null_                     | &#x0F22; Digit Two                               |\n| `U+0F23`  | Number           | NUMBER            | _null_                     | &#x0F23; Digit Three                             |\n| `U+0F24`  | Number           | NUMBER            | _null_                     | &#x0F24; Digit Four                              |\n| `U+0F25`  | Number           | NUMBER            | _null_                     | &#x0F25; Digit Five                              |\n| `U+0F26`  | Number           | NUMBER            | _null_                     | &#x0F26; Digit Six                               |\n| `U+0F27`  | Number           | NUMBER            | _null_                     | &#x0F27; Digit Seven                             |\n| `U+0F28`  | Number           | NUMBER            | _null_                     | &#x0F28; Digit Eight                             |\n| `U+0F29`  | Number           | NUMBER            | _null_                     | &#x0F29; Digit Nine                              |\n| `U+0F2A`  | Number           | NUMBER            | _null_                     | &#x0F2A; Digit Half One                          |\n| `U+0F2B`  | Number           | NUMBER            | _null_                     | &#x0F2B; Digit Half Two                          |\n| `U+0F2C`  | Number           | NUMBER            | _null_                     | &#x0F2C; Digit Half Three                        |\n| `U+0F2D`  | Number           | NUMBER            | _null_                     | &#x0F2D; Digit Half Four                         |\n| `U+0F2E`  | Number           | NUMBER            | _null_                     | &#x0F2E; Digit Half Five                         |\n| `U+0F2F`  | Number           | NUMBER            | _null_                     | &#x0F2F; Digit Half Six                          |\n| | | | | |\n| `U+0F30`  | Number           | NUMBER            | _null_                     | &#x0F30; Digit Half Seven                        |\n| `U+0F31`  | Number           | NUMBER            | _null_                     | &#x0F31; Digit Half Eight                        |\n| `U+0F32`  | Number           | NUMBER            | _null_                     | &#x0F32; Digit Half Nine                         |\n| `U+0F33`  | Number           | NUMBER            | _null_                     | &#x0F33; Digit Half Zero                         |\n| `U+0F34`  | Symbol           | SYMBOL            | _null_                     | &#x0F34; Bsdus Rtags                             |\n| `U+0F35`  | Mark [Mn]        | SYLLABLE_MODIFIER | BOTTOM_POSITION            | &#x0F35; Ngas Bzung Nyi Zla                      |\n| `U+0F36`  | Symbol           | SYMBOL            | _null_                     | &#x0F36; Caret -Dzud Rtags Bzhi Mig Can          |\n| `U+0F37`  | Mark [Mn]        | SYLLABLE_MODIFIER | BOTTOM_POSITION            | &#x0F37; Ngas Bzung Sgor Rtags                   |\n| `U+0F38`  | Symbol           | SYMBOL            | _null_                     | &#x0F38; Che Mgo                                 |\n| `U+0F39`  | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x0F39; Tsa -Phru                               |\n| `U+0F3A`  | Punctuation [Ps] | _null_            | _null_                     | &#x0F3A; Gug Rtags Gyon                          |\n| `U+0F3B`  | Punctuation [Pe] | _null_            | _null_                     | &#x0F3B; Gug Rtags Gyas                          |\n| `U+0F3C`  | Punctuation [Ps] | _null_            | _null_                     | &#x0F3C; Ang Khang Gyon                          |\n| `U+0F3D`  | Punctuation [Pe] | _null_            | _null_                     | &#x0F3D; Ang Khang Gyas                          |\n| `U+0F3E`  | Mark [Mc]        | VOWEL_DEPENDENT   | RIGHT_POSITION             | &#x0F3E; Sign Yar Tshes                          |\n| `U+0F3F`  | Mark [Mc]        | VOWEL_DEPENDENT   | LEFT_POSITION              | &#x0F3F; Sign Mar Tshes                          |\n| | | | | |\n| `U+0F40`  | Letter           | CONSONANT         | _null_                     | &#x0F40; Ka                                      |\n| `U+0F41`  | Letter           | CONSONANT         | _null_                     | &#x0F41; Kha                                     |\n| `U+0F42`  | Letter           | CONSONANT         | _null_                     | &#x0F42; Ga                                      |\n| `U+0F43`  | Letter           | CONSONANT         | _null_                     | &#x0F43; Gha                                     |\n| `U+0F44`  | Letter           | CONSONANT         | _null_                     | &#x0F44; Nga                                     |\n| `U+0F45`  | Letter           | CONSONANT         | _null_                     | &#x0F45; Ca                                      |\n| `U+0F46`  | Letter           | CONSONANT         | _null_                     | &#x0F46; Cha                                     |\n| `U+0F47`  | Letter           | CONSONANT         | _null_                     | &#x0F47; Ja                                      |\n| `U+0F48`  | _unassigned_     |                   |                            |                                                  |\n| `U+0F49`  | Letter           | CONSONANT         | _null_                     | &#x0F49; Nya                                     |\n| `U+0F4A`  | Letter           | CONSONANT         | _null_                     | &#x0F4A; Tta                                     |\n| `U+0F4B`  | Letter           | CONSONANT         | _null_                     | &#x0F4B; Ttha                                    |\n| `U+0F4C`  | Letter           | CONSONANT         | _null_                     | &#x0F4C; Dda                                     |\n| `U+0F4D`  | Letter           | CONSONANT         | _null_                     | &#x0F4D; Ddha                                    |\n| `U+0F4E`  | Letter           | CONSONANT         | _null_                     | &#x0F4E; Nna                                     |\n| `U+0F4F`  | Letter           | CONSONANT         | _null_                     | &#x0F4F; Ta                                      |\n| | | | | |\t\t\t\t\t\t \n| `U+0F50`  | Letter           | CONSONANT         | _null_                     | &#x0F50; Tha                                     |\n| `U+0F51`  | Letter           | CONSONANT         | _null_                     | &#x0F51; Da                                      |\n| `U+0F52`  | Letter           | CONSONANT         | _null_                     | &#x0F52; Dha                                     |\n| `U+0F53`  | Letter           | CONSONANT         | _null_                     | &#x0F53; Na                                      |\n| `U+0F54`  | Letter           | CONSONANT         | _null_                     | &#x0F54; Pa                                      |\n| `U+0F55`  | Letter           | CONSONANT         | _null_                     | &#x0F55; Pha                                     |\n| `U+0F56`  | Letter           | CONSONANT         | _null_                     | &#x0F56; Ba                                      |\n| `U+0F57`  | Letter           | CONSONANT         | _null_                     | &#x0F57; Bha                                     |\n| `U+0F58`  | Letter           | CONSONANT         | _null_                     | &#x0F58; Ma                                      |\n| `U+0F59`  | Letter           | CONSONANT         | _null_                     | &#x0F59; Tsa                                     |\n| `U+0F5A`  | Letter           | CONSONANT         | _null_                     | &#x0F5A; Tsha                                    |\n| `U+0F5B`  | Letter           | CONSONANT         | _null_                     | &#x0F5B; Dza                                     |\n| `U+0F5C`  | Letter           | CONSONANT         | _null_                     | &#x0F5C; Dzha                                    |\n| `U+0F5D`  | Letter           | CONSONANT         | _null_                     | &#x0F5D; Wa                                      |\n| `U+0F5E`  | Letter           | CONSONANT         | _null_                     | &#x0F5E; Zha                                     |\n| `U+0F5F`  | Letter           | CONSONANT         | _null_                     | &#x0F5F; Za                                      |\n| | | | | |\t\t\t\t\t\t \n| `U+0F60`  | Letter           | CONSONANT         | _null_                     | &#x0F60; -A                                      |\n| `U+0F61`  | Letter           | CONSONANT         | _null_                     | &#x0F61; Ya                                      |\n| `U+0F62`  | Letter           | CONSONANT         | _null_                     | &#x0F62; Ra                                      |\n| `U+0F63`  | Letter           | CONSONANT         | _null_                     | &#x0F63; La                                      |\n| `U+0F64`  | Letter           | CONSONANT         | _null_                     | &#x0F64; Sha                                     |\n| `U+0F65`  | Letter           | CONSONANT         | _null_                     | &#x0F65; Ssa                                     |\n| `U+0F66`  | Letter           | CONSONANT         | _null_                     | &#x0F66; Sa                                      |\n| `U+0F67`  | Letter           | CONSONANT         | _null_                     | &#x0F67; Ha                                      |\n| `U+0F68`  | Letter           | CONSONANT         | _null_                     | &#x0F68; A                                       |\n| `U+0F69`  | Letter           | CONSONANT         | _null_                     | &#x0F69; Kssa                                    |\n| `U+0F6A`  | Letter           | CONSONANT         | _null_                     | &#x0F6A; Fixed-Form Ra                           |\n| `U+0F6B`  | Letter           | CONSONANT         | _null_                     | &#x0F6B; Kka                                     |\n| `U+0F6C`  | Letter           | CONSONANT         | _null_                     | &#x0F6C; Rra                                     |\n| `U+0F6D`  | _unassigned_     |                   |                            |                                                  |\n| `U+0F6E`  | _unassigned_     |                   |                            |                                                  |\n| `U+0F6F`  | _unassigned_     |                   |                            |                                                  |\n| | | | | |\n| `U+0F70`  | _unassigned_     |                   |                            |                                                  |\n| `U+0F71`  | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0F71; Sign Aa                                 |\n| `U+0F72`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F72; Sign I                                  |\n| `U+0F73`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F73; Sign Ii                                 |\n| `U+0F74`  | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0F74; Sign U                                  |\n| `U+0F75`  | Mark [Mn]        | VOWEL_DEPENDENT   | BOTTOM_POSITION            | &#x0F75; Sign Uu                                 |\n| `U+0F76`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F76; Sign Vocalic R                          |\n| `U+0F77`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F77; Sign Vocalic Rr                         |\n| `U+0F78`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F78; Sign Vocalic L                          |\n| `U+0F79`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F79; Sign Vocalic Ll                         |\n| `U+0F7A`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F7A; Sign E                                  |\n| `U+0F7B`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F7B; Sign Ee                                 |\n| `U+0F7C`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F7C; Sign O                                  |\n| `U+0F7D`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F7D; Sign Oo                                 |\n| `U+0F7E`  | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0F7E; Sign Rjes Su Nga Ro                     |\n| `U+0F7F`  | Mark [Mc]        | VISARGA           | RIGHT_POSITION             | &#x0F7F; Sign Rnam Bcad                          |\n| | | | | |\n| `U+0F80`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_POSITION               | &#x0F80; Sign Reversed I                         |\n| `U+0F81`  | Mark [Mn]        | VOWEL_DEPENDENT   | TOP_AND_BOTTOM_POSITION    | &#x0F81; Sign Reversed Ii                        |\n| `U+0F82`  | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0F82; Sign Nyi Zla Naa Da                     |\n| `U+0F83`  | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0F83; Sign Sna Ldan                           |\n| `U+0F84`  | Mark [Mn]        | VIRAMA            | BOTTOM_POSITION            | &#x0F84; Halanta                                 |\n| `U+0F85`  | Punctuation      | AVAGRAHA          | _null_                     | &#x0F85; Paluta                                  |\n| `U+0F86`  | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#x0F86; Sign Lci Rtags                          |\n| `U+0F87`  | Mark [Mn]        | TONE_MARKER       | TOP_POSITION               | &#x0F87; Sign Yang Rtags                         |\n| `U+0F88`  | Letter           | CONSONANT_HEAD    | _null_                     | &#x0F88; Sign Lce Tsa Can                        |\n| `U+0F89`  | Letter           | CONSONANT_HEAD    | _null_                     | &#x0F89; Sign Mchu Can                           |\n| `U+0F8A`  | Letter           | CONSONANT_HEAD    | _null_                     | &#x0F8A; Sign Gru Can Rgyings                    |\n| `U+0F8B`  | Letter           | CONSONANT_HEAD    | _null_                     | &#x0F8B; Sign Gru Med Rgyings                    |\n| `U+0F8C`  | Letter           | CONSONANT_HEAD    | _null_                     | &#x0F8C; Sign Inverted Mchu Can                  |\n| `U+0F8D`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F8D; Subjoined Sign Lce Tsa Can              |\n| `U+0F8E`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F8E; Subjoined Sign Mchu Can                 |\n| `U+0F8F`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F8F; Subjoined Sign Inverted Mchu Can        |\n| | | | | |\n| `U+0F90`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F90; Subjoined Ka                            |\n| `U+0F91`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F91; Subjoined Kha                           |\n| `U+0F92`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F92; Subjoined Ga                            |\n| `U+0F93`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F93; Subjoined Gha                           |\n| `U+0F94`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F94; Subjoined Nga                           |\n| `U+0F95`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F95; Subjoined Ca                            |\n| `U+0F96`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F96; Subjoined Cha                           |\n| `U+0F97`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F97; Subjoined Ja                            |\n| `U+0F98`  | _unassigned_     |                   |                            |                                                  |\n| `U+0F99`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F99; Subjoined Nya                           |\n| `U+0F9A`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9A; Subjoined Tta                           |\n| `U+0F9B`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9B; Subjoined Ttha                          |\n| `U+0F9C`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9C; Subjoined Dda                           |\n| `U+0F9D`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9D; Subjoined Ddha                          |\n| `U+0F9E`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9E; Subjoined Nna                           |\n| `U+0F9F`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0F9F; Subjoined Ta                            |\n| | | | | |\t\t\t\t\t\t\n| `U+0FA0`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA0; Subjoined Tha                           |\n| `U+0FA1`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA1; Subjoined Da                            |\n| `U+0FA2`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA2; Subjoined Dha                           |\n| `U+0FA3`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA3; Subjoined Na                            |\n| `U+0FA4`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA4; Subjoined Pa                            |\n| `U+0FA5`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA5; Subjoined Pha                           |\n| `U+0FA6`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA6; Subjoined Ba                            |\n| `U+0FA7`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA7; Subjoined Bha                           |\n| `U+0FA8`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA8; Subjoined Ma                            |\n| `U+0FA9`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FA9; Subjoined Tsa                           |\n| `U+0FAA`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAA; Subjoined Tsha                          |\n| `U+0FAB`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAB; Subjoined Dza                           |\n| `U+0FAC`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAC; Subjoined Dzha                          |\n| `U+0FAD`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAD; Subjoined Wa                            |\n| `U+0FAE`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAE; Subjoined Zha                           |\n| `U+0FAF`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FAF; Subjoined Za                            |\n| | | | | |\t\t\t\t\t\t\n| `U+0FB0`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB0; Subjoined -A                            |\n| `U+0FB1`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB1; Subjoined Ya                            |\n| `U+0FB2`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB2; Subjoined Ra                            |\n| `U+0FB3`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB3; Subjoined La                            |\n| `U+0FB4`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB4; Subjoined Sha                           |\n| `U+0FB5`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB5; Subjoined Ssa                           |\n| `U+0FB6`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB6; Subjoined Sa                            |\n| `U+0FB7`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB7; Subjoined Ha                            |\n| `U+0FB8`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB8; Subjoined A                             |\n| `U+0FB9`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FB9; Subjoined Kssa                          |\n| `U+0FBA`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FBA; Subjoined Fixed-Form Wa                 |\n| `U+0FBB`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FBB; Subjoined Fixed-Form Ya                 |\n| `U+0FBC`  | Mark [Mn]        |CONSONANT_SUBJOINED| BOTTOM_POSITION            | &#x0FBC; Subjoined Fixed-Form Ra                 |\n| `U+0FBD`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FBE`  | Symbol           | SYMBOL            | _null_                     | &#x0FBE; Ku Ru Kha                               |\n| `U+0FBF`  | Symbol           | SYMBOL            | _null_                     | &#x0FBF; Ku Ru Kha Bzhi Mig Can                  |\n| | | | | |\n| `U+0FC0`  | Symbol           | SYMBOL            | _null_                     | &#x0FC0; Cantillation Sign Heavy Beat            |\n| `U+0FC1`  | Symbol           | SYMBOL            | _null_                     | &#x0FC1; Cantillation Sign Light Beat            |\n| `U+0FC2`  | Symbol           | SYMBOL            | _null_                     | &#x0FC2; Cantillation Sign Cang Te-U             |\n| `U+0FC3`  | Symbol           | SYMBOL            | _null_                     | &#x0FC3; Cantillation Sign Sbub -Chal            |\n| `U+0FC4`  | Symbol           | SYMBOL            | _null_                     | &#x0FC4; Symbol Dril Bu                          |\n| `U+0FC5`  | Symbol           | SYMBOL            | _null_                     | &#x0FC5; Symbol Rdo Rje                          |\n| `U+0FC6`  | Mark [Mn]        | SYLLABLE_MODIFIER | BOTTOM_POSITION            | &#x0FC6; Symbol Padma Gdan                       |\n| `U+0FC7`  | Symbol           | SYMBOL            | _null_                     | &#x0FC7; Symbol Rdo Rje Rgya Gram                |\n| `U+0FC8`  | Symbol           | SYMBOL            | _null_                     | &#x0FC8; Symbol Phur Pa                          |\n| `U+0FC9`  | Symbol           | SYMBOL            | _null_                     | &#x0FC9; Symbol Nor Bu                           |\n| `U+0FCA`  | Symbol           | SYMBOL            | _null_                     | &#x0FCA; Symbol Nor Bu Nyis -Khyil               |\n| `U+0FCB`  | Symbol           | SYMBOL            | _null_                     | &#x0FCB; Symbol Nor Bu Gsum -Khyil               |\n| `U+0FCC`  | Symbol           | SYMBOL            | _null_                     | &#x0FCC; Symbol Nor Bu Bzhi -Khyil               |\n| `U+0FCD`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FCE`  | Symbol           | SYMBOL            | _null_                     | &#x0FCE; Sign Rdel Nag Rdel Dkar                 |\n| `U+0FCF`  | Symbol           | SYMBOL            | _null_                     | &#x0FCF; Sign Rdel Nag Gsum                      |\n| | | | | |\n| `U+0FD0`  | Punctuation      | _null_            | _null_                     | &#x0FD0; Bska- Shog Gi Mgo Rgyan                 |\n| `U+0FD1`  | Punctuation      | _null_            | _null_                     | &#x0FD1; Mnyam Yig Gi Mgo Rgyan                  |\n| `U+0FD2`  | Punctuation      | _null_            | _null_                     | &#x0FD2; Nyis Tsheg                              |\n| `U+0FD3`  | Punctuation      | _null_            | _null_                     | &#x0FD3; Initial Brda Rnying Yig Mgo Mdun        |\n| `U+0FD4`  | Punctuation      | _null_            | _null_                     | &#x0FD4; Closing Brda Rnying Yig Mgo Sgab        |\n| `U+0FD5`  | Symbol           | SYMBOL            | _null_                     | &#x0FD5; Right-Facing Svasti Sign                |\n| `U+0FD6`  | Symbol           | SYMBOL            | _null_                     | &#x0FD6; Left-Facing Svasti Sign                 |\n| `U+0FD7`  | Symbol           | SYMBOL            | _null_                     | &#x0FD7; Right-Facing Svasti Sign With Dots      |\n| `U+0FD8`  | Symbol           | SYMBOL            | _null_                     | &#x0FD8; Left-Facing Svasti Sign With Dots       |\n| `U+0FD9`  | Punctuation      | _null_            | _null_                     | &#x0FD9; Leading Mchan Rtags                     |\n| `U+0FDA`  | Punctuation      | _null_            | _null_                     | &#x0FDA; Trailing Mchan Rtags                    |\n| `U+0FDB`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FDC`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FDD`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FDE`  | _unassigned_     |                   |                            |                                                  |\n| `U+0FDF`  | _unassigned_     |                   |                            |                                                  |\n| | | | | |\n:::\n\n\n## Miscellaneous character table ##\n\nOther important characters that may be encountered when shaping runs\nof Tibetan text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n\n:::{table} Miscellaneous character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                          |\n|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|\n|`U+00A0`   | Separator        | PLACEHOLDER       | _null_                     | &#x00A0; No-break space        |\n|`U+200C`   | Other            | NON_JOINER        | _null_                     | &#x200C; Zero-width non-joiner |\n|`U+200D`   | Other            | JOINER            | _null_                     | &#x200D; Zero-width joiner     |\n|`U+25CC`   | Symbol           | DOTTED_CIRCLE     | _null_                     | &#x25CC; Dotted circle         |\n|`U+2638`   | Symbol           | SYMBOL            | _null_                     | &#x2638; Wheel of Dharma       |\n:::\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation of a conjunct\nfrom a \"_consonant_,Halant,_consonant_\" sequence. The sequence\n\"_consonant_,Halant,ZWJ,_consonant_\" blocks the formation of a\nconjunct between the two consonants.\n\nNote, however, that the \"_consonant_,Halant\" subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. The sequence\n\"_consonant_,Halant,ZWNJ,_consonant_\" should produce the first\nconsonant in its standard form, followed by an explicit \"Halant\".\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n\"Reph\". An initial \"Ra,Halant,ZWJ\" sequence should not produce a \"Reph\",\nwhere an initial \"Ra,Halant\" sequence without the zero-width joiner\notherwise would.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch \"NBSP,ZWJ,Halant,_consonant_\", \"NBSP,_mark_\", or \"NBSP,_matra_\".\n"
  },
  {
    "path": "character-tables/index.md",
    "content": "# Character tables #\n\nThe section includes per-srcipt reference tables showing the\nshaping-related properties of the codepoints used for each script,\nas well as auxiliary information about the codepoints and notes on\ncontrol characters and other special-purpose codepoints that could\nprove relevant to shapinh-engine implementers.\n\n\n  - Indic\n      - [Devanagari](character-tables-devanagari.md)\n      - [Bengali](character-tables-bengali.md)\n      - [Gujarati](character-tables-gujarati.md)\n      - [Gurmukhi](character-tables-gurmukhi.md)\n      - [Kannada](character-tables-kannada.md)\n      - [Malayalam](character-tables-malayalam.md)\n      - [Oriya](character-tables-oriya.md)\n      - [Tamil](character-tables-tamil.md)\n      - [Telugu](character-tables-telugu.md)\n      - [Sinhala](character-tables-sinhala.md)\n\t  - _Vedic Extensions tables are included in each Indic script_\n  - Arabic\n      - [Arabic](character-tables-arabic.md)\n      - [Syriac](character-tables-syriac.md)\n      - [N'Ko](character-tables-nko.md)\n      - [Mongolian](character-tables-mongolian.md)\n  - Hangul\n      - [Hangul Jamo](character-tables-hangul.md)\n  - Hebrew\n      - [Hebrew](character-tables-hebrew.md)\n  - Khmer\n      - [Khmer](character-tables-khmer.md)\n  - Lao\n      - [Lao](character-tables-lao.md)\n  - Myanmar\n      - [Myanmar](character-tables-myanmar.md)\n  - Thai\n      - [Thai](character-tables-thai.md)\n  - Tibetan\n      - [Tibetan](character-tables-tibetan.md)\n\n\n:::{note}\nTables are not provided for the default or Universal Shaping Engine\n(<abbr>USE</abbr>) shaping documents, each of which covers a\nmultitude of individual scripts, nor for the emoji shaping document,\nbecause emoji usage is not specific to any individual script.\n:::\n"
  },
  {
    "path": "conf.py",
    "content": "# Configuration file for the Sphinx documentation builder.\n#\n# For the full list of built-in configuration values, see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\nimport sys\nfrom pathlib import Path\n\n# -- Project information -----------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information\n\nproject = 'OpenType<br>Shaping<br>Documents'\ncopyright = '2022, Sponsored by YesLogic'\nauthor = 'Sponsored by YesLogic'\n\nversion = \"0.9\"\nrelease = \"0.9alpha1\"\n\n# -- General configuration ---------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration\n\nsys.path.append(str(Path('_ext').resolve()))\n\nextensions = ['myst_parser', 'sphinx_external_toc', 'sphinx_inline_svg', 'shapingdocs_svg_color_toggles']\n\nsource_suffix = {'.md': 'markdown'}\n\ntemplates_path = ['_templates']\nexclude_patterns = ['_build', '_ext', 'test', 'Thumbs.db', '.DS_Store', 'BUILD.md', 'README.md', '**-image-generation-log.md', 'character-tables/README.md', 'images/images-index.md', 'images/README.md', 'notes/README.md'] # Eventually need to remove the links to image-generation-logs from the root README.md\n\nroot_doc = 'index' # Renamed to split GitHub README from production index\n\nnumfig = True\nnumfig_secnum_depth = 2\n\nmyst_heading_anchors = 6\n\n# attrs_inline to specify HTML element attributes like img 'title' that are getting lost on build.\nmyst_enable_extensions = ['substitution', 'smartquotes', 'colon_fence', 'attrs_inline']\n\nmyst_substitutions = {\n    'opentogglebutton': '<br><button onclick=\"toggleColor(',\n    'closetogglebutton': ')\">Substitution Toggle cluster colors</button><br>',\n    'khmer_midsyllable_mark_table_workaround': 'Mid-syllable marks that must be tagged for sorting with above-base consonants',\n}\n\nexternal_toc_path = \"_toc.yml\"\n\n# Starting with sphinx_external_toc 1.1.0, \"multitoc numbering\" is activated\n# by this configuration key and the standalone sphinx_multitoc_numbering\n# extension is not required\nuse_multitoc_numbering = True \n\n# -- Options for HTML output -------------------------------------------------\n# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output\n\nhtml_theme = 'alabaster'\nhtml_static_path = ['_static']\nhtml_js_files = ['toggleSvgColors.js']\nhtml_sidebars = {\n    '**': [\n        'about.html',\n        'static_nav.html',\n#        'navigation.html', # Replacing default navigation with static version above\n        'searchbox.html',\n        'sourcelink.html',\n        ]\n    }\nhtml_theme_options = {\n    'page_width': '1200px',\n    'sidebar_width': '300px',\n    'github_user': 'n8willis',\n    'github_repo': 'opentype-shaping-documents',\n    'font_family': 'Source\\ Serif\\ 4',\n    'head_font_family': 'Source\\ Serif\\ 4',\n    'caption_font_family': 'Source\\ Serif\\ 4',\n    'code_font_family': 'Source\\ Code\\ Pro',\n    'github_button': True,\n    'github_type': 'watch',\n    'github_count': True,\n    'extra_nav_links': {\n        'GitHub issues': 'https://github.com/n8willis/opentype-shaping-documents/issues',\n        'Build process': 'https://github.com/n8willis/opentype-shaping-documents/blob/master/BUILD.md', # Fix the directory path after PR merge; Add contributor-guide link\n        }\n}\n"
  },
  {
    "path": "errata.md",
    "content": "# OpenType shaping errata #\n\nThis document details errata that shaping engines may encounter, such\nas ambiguities or omissions in the existing OpenType or Unicode\nspecification documents.\n\n\n**Contents**\n\n  - [Unicode](#unicode)\n      - [<abbr>ZWJ</abbr> and <abbr>ZWNJ</abbr>](#zwj-and-zwnj)\n\t      - [Scope of <abbr>ZWJ</abbr> and <abbr>ZWNJ</abbr>](#scope-of-zwj-and-zwnj)\n\t      - [<abbr>ZWJ</abbr> in redundant ligature lookups](#zwj-in-redundant-ligature-lookups)\n      - [Emoji](#emoji)\n\t      - [Skin-tone permutations](#skin-tone-permutations)\n\t\t  - [Gender permutations](#gender-permutations)\n  - [OpenType](#opentype)\n      - [Null offsets in <abbr>GSUB</abbr> and <abbr>GPOS</abbr>](#null-offsets-in-gsub-and-gpos)\n      - [Sorting of <abbr>GSUB</abbr> and <abbr>GPOS</abbr> lookups](#sorting-of-gsub-and-gpos-lookups)\n\t  - [Per-script applicability of feature tags](#per-script-applicability-of-feature-tags)\n      - [Ordering of post-base and below-base consonants in Indic2 base-consonant determination](#ordering-of-post-base-and-below-base-consonants-in-indic2-base-consonant-determination)\n      - [Lookup behavior](#lookup-behavior)\n          - [Using MultipleSub for glyph deletion](#using-multiplesub-for-glyph-deletion)\n\t\t  - [Processing nested contextual lookups](#processing-nested-contextual-lookups)\n      - [Adjacent-mark reordering ambiguities](#adjacent-mark-reordering-ambiguities)\n      - [Merging of glyph properties](#merging-of-glyph-properties)\n  - [See also](#see-also)\n\n  \n  \n## Unicode ##\n\nThis section lists errata pertaining to the Unicode Standard.\n\n### <abbr>ZWJ</abbr> and <abbr>ZWNJ</abbr> ###\n\n#### Scope of <abbr>ZWJ</abbr> and <abbr>ZWNJ</abbr> ####\n\nUnicode provides the Zero Width Joiner (<abbr>ZWJ</abbr>) and Zero Width Non-Joiner\n(<abbr>ZWNJ</abbr>) control characters so that a text sequence can \"request a\nrendering system to have more or less of a connection between\ncharacters than they would otherwise have.\"\n\nThe generic examples used in the standard show how <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>\ncharacters can affect the cursive-joining behavior between two\ncharacters or the ligature-forming behavior between two\ncharacters. However, the standard does not explicitly say whether or\nnot the presence of a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should influence the shaping\nbehavior of characters for characters not adjacent to the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nFor example, in the sequence <samp>\"a,b,ZWNJ,c,d\"</samp> the <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should prevent\nthe application of a ligature between <samp>\"b\"</samp> and <samp>\"c\"</samp> (if such a ligature\nlookup exists in the active font).\n\nHowever, if the active font contains a contextual ligature lookup for\n<samp>\"c,d\"</samp> when preceded by <samp>\"b\"</samp>, it is not clear whether or not the <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>\nin the same <samp>\"a,b,ZWNJ,c,d\"</samp> sequence should inhibit the application of\nthe ligature between <samp>\"c\"</samp> and <samp>\"d\"</samp>.\n\n\n#### <abbr>ZWJ</abbr> in redundant ligature lookups ####\n\nAn \"Implementation Notes\" section in chapter 23.2 of the Unicode\nStandard says that font vendors should add <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequences to ligature\nlookups. For example, if the sequence <samp>\"f,i\"</samp> triggers the <samp>\"fi\"</samp>\nligature, then the font should also include a lookup that triggers the\n<samp>\"fi\"</samp> ligature for <samp>\"f,ZWJ,i\"</samp>. \n\nHowever, the text of chapter 23.2 prior to the \"Implementation Notes\"\nsays that <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> \"are not to be used in all cases where\nligatures or cursive connections are desired; instead, they are meant\nonly for over-riding the normal behavior of the text.\" That logic\nmakes the suggested <samp>\"f,ZWJ,i\"</samp> ligature lookup superfluous, because it\nduplicates the effects of the existing <samp>\"f,i\"</samp> ligature lookup.\n\nUsing <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> within lookup patterns in the manner suggested by the\n\"Implementation Notes\" is not common practice. \n\n### Emoji ###\n\n#### Skin-tone permutations ####\n\nIt is unclear whether <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group emoji sequences are\nexpected to include combinations where some emoji in the sequence are\nfollowed by a Fitzpatrick skin-tone modifier but other emoji in the\nsequence are not followed by a Fitzpatrick skin-tone modifier.\n\nFor example, it is unclear whether the sequence\n<samp>\"Man,ZWJ,Handshake,Man,SkinTone-2\"</samp> constitues a valid\n<abbr title=\"Zero-Width Joiner\">ZWJ</abbr> \"Couple holding hands\" sequence.\n\n\n#### Gender permutations ####\n\nIt is unclear whether <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group emoji sequences are\nexpected to include combinations where some emoji in the sequence are\nare an explicit gender but other emoji in the sequence are not\nexplicit gender.\n\nFor example, it is unclear whether the sequence\n<samp>\"Man,ZWJ,Handshake,Person\"</samp> constitues a valid\n<abbr title=\"Zero-Width Joiner\">ZWJ</abbr> \"Couple holding hands\" sequence.\n\nIt is also unclear whether the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person family sequence must\nhave explicit gender-ordering for the adult humans depicted.\n\nFor example, it is unclear whether the sequence\n<samp>\"Man,ZWJ,Woman,ZWJ,Girl\"</samp> should be rendered identically to the\nsequence <samp>\"Woman,ZWJ,Man,ZWJ,Girl\"</samp>.\n\n\n## OpenType ##\n\nThis section lists errata pertaining to the OpenType specification.\n\n### Null offsets in <abbr>GSUB</abbr> and <abbr>GPOS</abbr> ###\n\nThe headers of the <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> tables include fields that contain\nthe offsets at which other structures within the font binary are\nfound. For example, the value of the `featureVariationsOffset` field\nindicates the byte value at which the featureVariations structure is\nlocated.\n\nThe OpenType specification notes that `featureVariationsOffset` can be\n`NULL`, but the specification does not indicate whether or any other\noffset values can also be `NULL` (nor, conversely, does it indicate\nwhether `NULL` should be considered invalid).\n\nIn practice, other fields -- such as `scriptListOffset`,\n`featureListOffset`, and `lookupListOffset` -- may have `NULL` values.\nIn such situations, `NULL` is usually intrepreted as meaning that the\nstructure nominally pointed to by the offset is empty.\n\nFurthermore, font-validation functions may overwrite a `NULL` into an\noffset field if the original value encountered was invalid.\n\n\n### Sorting of <abbr>GSUB</abbr> and <abbr>GPOS</abbr> lookups ###\n\nThe OpenType specification requires that lookups in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nmust be sorted into numeric order before they are applied.\n\nLookups in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table, however, are not expected to be sorted\nfirst, because <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups are applied in a specified order.\n\n### Per-script applicability of feature tags ###\n\nSome OpenType feature tags are defined only to apply to text runs in\nspecific scripts. Other feature tags are defined to apply to text in\nany script.\n\nHowever, the definitions of some feature tags list a limited number of\nexample scripts to which the feature should apply, but do not specify\nevery supported script.\n\nFor example, the `pstf` (post-base forms) tag is\n[described](https://docs.microsoft.com/en-us/typography/opentype/spec/features_pt#tag-pstf)\nas required for \"scripts of south and southeast Asia that have\npost-base forms for consonants eg: Gurmukhi, Malayalam, Khmer.\"\n\n\n### Ordering of post-base and below-base consonants in Indic2 base-consonant determination ###\n\nThe Microsoft script-development specification for all Indic2-model\nscripts\n[states](https://docs.microsoft.com/en-us/typography/script-development/bengali#reorder-characters)\nparenthetically that \"post-base forms have to follow below-base forms\". \n\nIf this statement is taken to be a rule, it would affect the\nbase-consonant search algorithm.\n\nFor example, in the Bengali sequence <samp>\"Ka,Halant,Ba,Halant,Ya\"</samp>\n(`U+0995`,`U+09CD`,`U+09AC`,`U+09CD`,`U+09AF`), <samp>\"Ka\"</samp> would be\nidentified as the syllable base, with <samp>\"Ba\"</samp> designated a below-base\nform and <samp>\"Ya\"</samp> designated a post-base form. However, in the similar\nsequence <samp>\"Ka,Halant,Ya,Halant,Ba\"</samp>\n(`U+0995`,`U+09CD`,`U+09AF`,`U+09CD`,`U+09AC`), <samp>\"Ya\"</samp> would be\nidentified as the base consonant.\n\nReal-world Bengali texts provide counterexamples that contradict the\nassumption that \"post-base forms follow below-base forms\" is a\nrequirement.\n\nIn other scripts, such as Telugu, the \"post-base forms have to follow\nbelow-base forms\" statement is, perhaps, statistically likely, but is\ncertainly not an orthographic rule.\n\nConsequently, it is unclear if the statement should be enforced as a\nrule or if it should be regarded as a suggestion, and it is unclear to\nwhat degree that answer varies between the Indic2-model scripts.\n\n\n### Lookup behavior ###\n\n#### Using MultipleSub for glyph deletion ####\n\nThe <abbr title=\"Glyph Substitution table\">GSUB</abbr> specification says that a `MultipleSubst` substitution cannot\nbe used to delete a glyph: it always substitutes at least one\nreplacement glyph. However, some implementations allow the\nreplacement-glyph array to be zero-length. \n\n#### Processing nested contextual lookups ####\n\nThe <abbr title=\"Glyph Substitution table\">GSUB</abbr> specification allows contextual substitutions to invoke other\ncontextual substitutions. It is unclear how implementations ought to\nhandle certain cases of these nested lookups.\n\nFor example:\n```\ncontext: 'a'\nsubst index 0:\n  context: 'ab'\n  subst index 1: 'b' → 'ab'\n```\n\nThis nested set of substitutions could cause an infinite loop on\ncertain input strings, if it is interpreted in a naive manner:\n```\n'[]ab' // begin at start of glyph sequence\n'[a]b' // context matches\n'[ab]' // nested context matches at index 0\n'[aab]' // subst applies at index 1\n'[a]ab' // return to parent context, uh oh!\n'a[]ab' // move on to next glyph\n'a[a]b' // context matches, infinite loop!\n```\n\nIn short, if a nested contextual substitution can insert glyphs ahead\nof its parent contextual substitution's context, then it creates a\n\"stack\" that allows Turing-complete computation.\n\n\n\n### Adjacent-mark reordering ambiguities ###\n\nThe Microsoft script-development specifications\n[say](https://docs.microsoft.com/en-us/typography/script-development/devanagari#reorder-characters)\nthat marks should be reordered \"to canonical order\" (step 3 in the\nlinked Devanagari document) in the reordering phase. However, the same\nstep also describes this step as \"Adjacent nukta and halant or nukta\nand Vedic sign are always repositioned if necessary, so that the nukta\nis first.\"\n\nTogether, it is somewhat ambiguous as to whether only <samp>\"Halant,Nukta\"</samp>\nand <samp>\"_Vedic_sign_,Nukta\"</samp> sequences should be reordered by moving the\n<samp>\"Nukta\"</samp> to the beginning, or all sequences of marks require reordering\ninto Unicode canonical combining class order, with <samp>\"Nukta\"</samp> moving to\nthe initial position as a special case.\n\n\n### Merging of glyph properties ###\n\nWhen the application of a shaping operation merges two or more\nadjacent glyphs (for example, when two adjacent glyphs are substituted\nwith a single ligature glyph), the OpenType specification does not\ndictate how shaping engines should combine (for example, merge,\nreplace, or drop) the properties of the input glyphs to determine the\nproperties of the output glyph.\n\nThis may result in ambiguities when a sequence of glyphs has several\nsubstitutions applied in series.\n\nFor example, when shaping Indic scripts, glyphs may be tagged for the\npossible application of multiple features, such as `half` and `rkrf`,\nwhich are applied serially.\n\nHarfBuzz and Uniscribe both take the approach of retaining the\nproperties of the first input glyph in a sequence, propagating those\nproperties to the merged output glyph.\n\n\n## See also ##\n\nShaping engines may also want to offer explicit compatibility with\nMicrosoft Uniscribe, for the purpose of ensuring that users' existing\ndocuments do not break. Therefore, implementors may wish to consult\nthe [Uniscribe compatibility notes](notes/uniscribe-bug-compatibility.md).\n\nThese compatibilty notes record test-driven observations about\nUniscribe's behavior, and they include any behavior that is a known\nbug or a known deviation from specifications. Consequently, the issues\nraised by offering Uniscribe compatiblity cannot be considered errata\nin the sense that it is described above.\n"
  },
  {
    "path": "images/arabic/arabic-png-image-generation-log.md",
    "content": "# Commands used to generate the <abbr>PNG</abbr> images in [opentype-shaping-arabic.md](/opentype-shaping-arabic.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 4.1 `locl`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-before.png --features=-locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-after.png --features=+locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4\n\nmontage arabic-locl-before.png right-arrow.png arabic-locl-after.png -geometry +0+0 -background transparent arabic-locl.png\n\n\n## 4.2 `isol`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-before.png --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-after.png --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647\n\nmontage arabic-isol-before.png right-arrow.png arabic-isol-after.png -geometry +0+0 -background transparent arabic-isol.png\n\n\n## 4.3 `fina`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,0628\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,0628\n\nmontage arabic-fina-before.png right-arrow.png arabic-fina-after.png -geometry +0+0 -background transparent arabic-fina.png\n\n\n## 4.6 `medi`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,062e,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,062e,25cc\n\nmontage arabic-medi-before.png right-arrow.png arabic-medi-after.png -geometry +0+0 -background transparent arabic-medi.png\n\n\n## 4.8 `init`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,25cc\n\nmontage arabic-init-before.png right-arrow.png arabic-init-after.png -geometry +0+0 -background transparent arabic-init.png\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623\n\nmontage arabic-rlig-before.png right-arrow.png arabic-rlig-after.png -geometry +0+0 -background transparent arabic-rlig.png\n\n\n## 4.10 `rclt`\n\n> None found.\n\n\n## 4.11 `calt`\n\n> Note: Noto Nastaliq Urdu implements this as a `rlig` lookup for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-before.png --features=-liga,-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-after.png --features=+liga,+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645\n\nmontage arabic-calt-before.png right-arrow.png arabic-calt-after.png -geometry +0+0 -background transparent arabic-calt.png\n\n\n## 5.1 `liga`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-before.png --features=-liga,-fina,-medi,-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoKufiArabic-Regular.ttf --unicodes=0631,064a,0627,0644\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-after.png --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoKufiArabic-Regular.ttf --unicodes=0631,064a,0627,0644\n\nmontage arabic-liga-before.png right-arrow.png arabic-liga-after.png -geometry +0+0 -background transparent arabic-liga.png\n\n\n## 5.3 `cswh`\n\n> None found.\n\n\n## 5.4 `mset`\n\n> None found. Could be emulated with `mark`, however.\n\n\n## 7.1 `curs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-before.png --features=-curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-after.png --features=+curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645\n\nmontage arabic-curs-before.png right-arrow.png arabic-curs-after.png -geometry +0+0 -background transparent arabic-curs.png\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,32,2,32 --output-file=arabic-mark-before.png --features=-mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=arabic-mark-after.png --features=+mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653\n\nmontage arabic-mark-before.png right-arrow.png arabic-mark-after.png -geometry +0+0 -background transparent arabic-mark.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/arabic/arabic-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-arabic.md](../../opentype-shaping-arabic.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-Regular.ttf --unicodes=2192\n\n## 2 `ccmp`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-ccmp-before.svg --features=-ccmp --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=067e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-ccmp-after.svg --features=+ccmp --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=067e\n\nsvg_stack.py --direction=h arabic-ccmp-before.svg right-arrow.svg arabic-ccmp-after.svg > arabic-ccmp.svg\n\n\n\n## 4.1 `locl`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-before.svg --features=-locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-after.svg --features=+locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4\n\nsvg_stack.py --direction=h arabic-locl-before.svg right-arrow.svg arabic-locl-after.svg > arabic-locl.svg\n\n\n## 4.2 `isol`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-before.svg --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-after.svg --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647\n\nsvg_stack.py --direction=h arabic-isol-before.svg right-arrow.svg arabic-isol-after.svg > arabic-isol.svg\n\n\n## 4.3 `fina`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,200d,0628\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,0628\n\nsvg_stack.py --direction=h arabic-fina-before.svg right-arrow.svg arabic-fina-after.svg > arabic-fina.svg\n\n\n## 4.6 `medi`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,200d,062e,200d,0637\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,062e,0637\n\nsvg_stack.py --direction=h arabic-medi-before.svg right-arrow.svg arabic-medi-after.svg > arabic-medi.svg\n\n\n## 4.8 `init`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,200d,0626\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,0626\n\nsvg_stack.py --direction=h arabic-init-before.svg right-arrow.svg arabic-init-after.svg > arabic-init.svg\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623\n\nsvg_stack.py --direction=h arabic-rlig-before.svg right-arrow.svg arabic-rlig-after.svg > arabic-rlig.svg\n\n\n## 4.10 `rclt`\n\n> None found.\n\n\n## 4.11 `calt`\n\n> Note: Noto Nastaliq Urdu implements this as a `rlig` lookup for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-before.svg --features=-liga,-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-after.svg --features=+liga,+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645\n\nsvg_stack.py --direction=h arabic-calt-before.svg right-arrow.svg arabic-calt-after.svg > arabic-calt.svg\n\n\n## 5.1 `liga`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-before.svg --features=-liga,-fina,-medi,-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0627,0644,0644,0651,0670,06c1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-after.svg --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0627,0644,0644,0651,0670,06c1\n\nsvg_stack.py --direction=h arabic-liga-before.svg right-arrow.svg arabic-liga-after.svg > arabic-liga.svg\n\n\n## 5.2 `dlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dlig-before.svg --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0644,0647\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dlig-after.svg --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0644,0647\n\nsvg_stack.py --direction=h arabic-dlig-before.svg right-arrow.svg arabic-dlig-after.svg > arabic-dlig.svg\n\n\n## 5.3 `cswh`\n\n> None found.\n\n\n## 5.4 `mset`\n\n> None found. Could be emulated with `mark`, however.\n\n\n## 7.1 `curs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-before.svg --features=-curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-after.svg --features=+curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645\n\nsvg_stack.py --direction=h arabic-curs-before.svg right-arrow.svg arabic-curs-after.svg > arabic-curs.svg\n\n\n## 7.2 `dist` (not yet added to document)\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0628,062f,066e,062d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0628,062f,066e,062d\n\nsvg_stack.py --direction=h arabic-dist-before.svg right-arrow.svg arabic-dist-after.svg > arabic-dist.svg\n\n\n## 7.2 `kern`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0635,062f,0627\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0635,062f,0627\n\nsvg_stack.py --direction=h arabic-kern-before.svg right-arrow.svg arabic-kern-after.svg > arabic-kern.svg\n\n\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,32,2,32 --output-file=arabic-mark-before.svg --features=-mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=arabic-mark-after.svg --features=+mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653\n\nsvg_stack.py --direction=h arabic-mark-before.svg right-arrow.svg arabic-mark-after.svg > arabic-mark.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/bengali/bengali-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-bengali.md](../../opentype-shaping-bengali.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09cc\n\nhb-view --font-size=110 --margin=2,16,2,16\n--output-file=bengali-matra-decompose-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c7,200c,25cc,09d7\n\nmontage bengali-matra-decompose-before.png right-arrow.png bengali-matra-decompose-after.png -geometry +0+0 -background transparent bengali-matra-decompose.png\n\n\n## 2.7 Post-base consonants\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-before.png --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-after.png --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af\n\nmontage bengali-yaphala-before.png right-arrow.png bengali-yaphala-after.png -geometry +0+0 -background transparent bengali-yaphala.png\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,25cc,09bc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,09bc\n\nmontage bengali-nukt-before.png right-arrow.png bengali-nukt-after.png -geometry +0+0 -background transparent bengali-nukt.png\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b7\n\nmontage bengali-akhn-kssa-before.png right-arrow.png bengali-akhn-kssa-after.png -geometry +0+0 -background transparent bengali-akhn-kssa.png\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,25cc,09cd,099e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,09cd,099e\n\nmontage bengali-akhn-jnya-before.png right-arrow.png bengali-akhn-jnya-after.png -geometry +0+0 -background transparent bengali-akhn-jnya.png\n\n\n## 3.4 `rphf`\n\n### Bengali\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,25cc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc\n\nmontage bengali-rphf-before.png right-arrow.png bengali-rphf-after.png -geometry +0+0 -background transparent bengali-rphf.png\n\n\n### Assamese\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,25cc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc\n\nmontage bengali-rphf-as-before.png right-arrow.png bengali-rphf-as-after.png -geometry +0+0 -background transparent bengali-rphf-as.png\n\n## 3.7 `blwf`\n\n### Raphala\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0\n\nmontage bengali-raphala-before.png right-arrow.png bengali-raphala-after.png -geometry +0+0 -background transparent bengali-raphala.png\n\n\n### Baphala\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac\n\nmontage bengali-baphala-before.png right-arrow.png bengali-baphala-after.png -geometry +0+0 -background transparent bengali-baphala.png\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,0998\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,0998\n\nmontage bengali-half-ka-before.png right-arrow.png bengali-half-ka-after.png -geometry +0+0 -background transparent bengali-half-ka.png\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b0\n\nmontage bengali-vatu-before.png right-arrow.png bengali-vatu-after.png -geometry +0+0 -background transparent bengali-vatu.png\n\n\n## 3.12 `cjct`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-before.png --features=-init,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,25cc,09cd,09a4,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-after.png --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,09cd,09a4,25cc\n\nmontage bengali-cjct-before.png right-arrow.png bengali-cjct-after.png -geometry +0+0 -background transparent bengali-cjct.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c8,09b8,09cd,09ae,099a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b8,09cd,09ae,099a,09c8\n\nmontage bengali-matra-position-before.png right-arrow.png bengali-matra-position-after.png -geometry +0+0 -background transparent bengali-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-before.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc,09a1,09cd,09a1,09cd,0996,09c1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,09a1,09cd,09a1,09cd,0996,09c1\n\nmontage bengali-reph-position-before.png right-arrow.png bengali-reph-position-after.png -geometry +0+0 -background transparent bengali-reph-position.png\n\n## 5 `init`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7\n\nmontage bengali-init-before.png right-arrow.png bengali-init-after.png -geometry +0+0 -background transparent bengali-init.png\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf\n\nmontage bengali-pres-before.png right-arrow.png bengali-pres-after.png -geometry +0+0 -background transparent bengali-pres.png\n\n\n## 5 `abvs`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\nunknown reasons.\n\nhb-view --font-size=110 --margin=2,25,2,16 --output-file=bengali-abvs-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvs-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981\n\nmontage bengali-abvs-before.png right-arrow.png bengali-abvs-after.png -geometry +0+0 -background transparent bengali-abvs.png\n\n\n# 5 `blws`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\nunknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac\n\nmontage bengali-blws-before.png right-arrow.png bengali-blws-after.png -geometry +0+0 -background transparent bengali-blws.png\n\n\n## 5 `psts`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\nunknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0\n\nmontage bengali-psts-before.png right-arrow.png bengali-psts-after.png -geometry +0+0 -background transparent bengali-psts.png\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd\n\nmontage bengali-haln-before.png right-arrow.png bengali-haln-after.png -geometry +0+0 -background transparent bengali-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981\n\nmontage bengali-abvm-before.png right-arrow.png bengali-abvm-after.png -geometry +0+0 -background transparent bengali-abvm.png\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2\n\nmontage bengali-blwm-before.png right-arrow.png bengali-blwm-after.png -geometry +0+0 -background transparent bengali-blwm.png\n\n\n\n\n\n"
  },
  {
    "path": "images/bengali/bengali-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-bengali.md](../../opentype-shaping-bengali.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-Regular.ttf --unicodes=2192\n\ncluster_styles = None\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c7,200c,25cc,09d7\n\nsvg_stack.py --direction=h bengali-matra-decompose-before.svg right-arrow.svg bengali-matra-decompose-after.svg > bengali-matra-decompose.svg\n\ncluster_styles = [c0,dc,c0,arrow,c0,dc,dc,c1]\n\n\n## 2.7 Post-base consonants\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-before.svg --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-after.svg --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af\n\nsvg_stack.py --direction=h bengali-yaphala-before.svg right-arrow.svg bengali-yaphala-after.svg > bengali-yaphala.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c1]\n\n#### Duplicates for other subsections\n\ncp bengali-yaphala.svg bengali-yaphala-1.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c1]\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,25cc,09bc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,09bc\n\nsvg_stack.py --direction=h bengali-nukt-before.svg right-arrow.svg bengali-nukt-after.svg > bengali-nukt.svg\n\ncluster_styles = [c0,dc,c1,arrow,c0]\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b7\n\nsvg_stack.py --direction=h bengali-akhn-kssa-before.svg right-arrow.svg bengali-akhn-kssa-after.svg > bengali-akhn-kssa.svg\n\ncluster_styles = [c0,dc,c1,c2,arrow,c0]\n\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,25cc,09cd,099e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,09cd,099e\n\nsvg_stack.py --direction=h bengali-akhn-jnya-before.svg right-arrow.svg bengali-akhn-jnya-after.svg > bengali-akhn-jnya.svg\n\ncluster_styles = [c0,dc,c1,c2,arrow,c0]\n\n\n## 3.4 `rphf`\n\n### Bengali\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,25cc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc\n\nsvg_stack.py --direction=h bengali-rphf-before.svg right-arrow.svg bengali-rphf-after.svg > bengali-rphf.svg\n\ncluster_styles = [c0,dc,c1,arrow,dc,c0]\n\n\n### Assamese\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,25cc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc\n\nsvg_stack.py --direction=h bengali-rphf-as-before.svg right-arrow.svg bengali-rphf-as-after.svg > bengali-rphf-as.svg\n\ncluster_styles = [c0,dc,c1,arrow,dc,c0]\n\n\n## 3.7 `blwf`\n\n### Raphala\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0\n\nsvg_stack.py --direction=h bengali-raphala-before.svg right-arrow.svg bengali-raphala-after.svg > bengali-raphala.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c0]\n\n#### Duplicates for other subsections\n\ncp bengali-raphala.svg bengali-raphala-1.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c0]\n\ncp bengali-raphala.svg bengali-raphala-2.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c0]\n\n### Baphala\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac\n\nsvg_stack.py --direction=h bengali-baphala-before.svg right-arrow.svg bengali-baphala-after.svg > bengali-baphala.svg\n\ncluster_styles = [dc,c0,c1,arrow,dc,c0]\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,0998\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,0998\n\nsvg_stack.py --direction=h bengali-half-ka-before.svg right-arrow.svg bengali-half-ka-after.svg > bengali-half-ka.svg\n\ncluster_styles = [c0,dc,c1,c2,arrow,c0,c2]\n\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b0\n\nsvg_stack.py --direction=h bengali-vatu-before.svg right-arrow.svg bengali-vatu-after.svg > bengali-vatu.svg\n\ncluster_styles = [c0,dc,c1,c2,arrow,c0]\n\n\n## 3.12 `cjct`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-before.svg --features=-init,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,25cc,09cd,09a4,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-after.svg --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,09cd,09a4,25cc\n\nsvg_stack.py --direction=h bengali-cjct-before.svg right-arrow.svg bengali-cjct-after.svg > bengali-cjct.svg\n\ncluster_styles = [c0,dc,c1,c2,dc,arrow,c0,dc]\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c8,09b8,09cd,09ae,099a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b8,09cd,09ae,099a,09c8\n\nsvg_stack.py --direction=h bengali-matra-position-before.svg right-arrow.svg bengali-matra-position-after.svg > bengali-matra-position.svg\n\ncluster_styles = [c0,dc,c1,c2,arrow,c1,c0,c2]\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-before.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc,09a1,09cd,09a1,09cd,0996,09c1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,09a1,09cd,09a1,09cd,0996,09c1\n\nsvg_stack.py --direction=h bengali-reph-position-before.svg right-arrow.svg bengali-reph-position-after.svg > bengali-reph-position.svg\n\ncluster_styles = [c0,c1,dc,c2,c3,c4,arrow,c0,c1,c2,c3]\n\n\n## 5 `init`\n\n> ?? Maybe there's a headline-using second character that would be\n> better here....\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7\n\nsvg_stack.py --direction=h bengali-init-before.svg right-arrow.svg bengali-init-after.svg > bengali-init.svg\n\ncluster_styles = [c0,c1,arrow,c0,c1]\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf\n\nsvg_stack.py --direction=h bengali-pres-before.svg right-arrow.svg bengali-pres-after.svg > bengali-pres.svg\n\ncluster_styles = [c0,c1,arrow,c0]\n\n\n## 5 `abvs`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\n> unknown reasons.\n\n> No more!\n\nhb-view --font-size=110 --margin=2,25,2,16 --output-file=bengali-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981\n\nsvg_stack.py --direction=h bengali-abvs-before.svg right-arrow.svg bengali-abvs-after.svg > bengali-abvs.svg\n\ncluster_styles = [c0,c1,dc,c2,c3,arrow,c0,c1,dc,c2,c3]\n\n\n# 5 `blws`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\n> unknown reasons.\n\n> This now seems to require disablng -cjct and -blws, but -pres is no\n> longer involved.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-before.svg --features=-init,-blws,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac\n\nsvg_stack.py --direction=h bengali-blws-before.svg right-arrow.svg bengali-blws-after.svg > bengali-blws.svg\n\ncluster_styles = [c0,c1,arrow,c0]\n\n\n## 5 `psts`\n\n> Note that Noto Bengali implements this feature in a pres lookup for\n> unknown reasons.\n\n> No more!\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-before.svg --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-after.svg --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0\n\nsvg_stack.py --direction=h bengali-psts-before.svg right-arrow.svg bengali-psts-after.svg > bengali-psts.svg\n\ncluster_styles = [c0,c1,arrow,c0,c1]\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-before.svg --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd\n\nsvg_stack.py --direction=h bengali-haln-before.svg right-arrow.svg bengali-haln-after.svg > bengali-haln.svg\n\ncluster_styles = [c0,c1,c2,arrow,c0,c1,c2]\n\n\n## 6 `abvm`\n\n> ????\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981\n\nsvg_stack.py --direction=h bengali-abvm-before.svg right-arrow.svg bengali-abvm-after.svg > bengali-abvm.svg\n\ncluster_styles = [c0,c1,arrow,c0,c1]\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2\n\nsvg_stack.py --direction=h bengali-blwm-before.svg right-arrow.svg bengali-blwm-after.svg > bengali-blwm.svg\n\ncluster_styles = [c0,c1,arrow,c0,c1]\n\n"
  },
  {
    "path": "images/devanagari/devanagari-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-devanagari.md](../../opentype-shaping-devanagari.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 3.1 `locl`\n\n> Note: Noto Devanagari has a 'MAR' locl feature. \n\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,25cc,093c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c\n\nmontage devanagari-nukt-before.png right-arrow.png devanagari-nukt-after.png -geometry +0+0 -background transparent devanagari-nukt.png\n\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,25cc,094d,0937\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,094d,0937\n\nmontage devanagari-akhn-kssa-before.png right-arrow.png devanagari-akhn-kssa-after.png -geometry +0+0 -background transparent devanagari-akhn-kssa.png\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,25cc,094d,091e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,094d,091e\n\nmontage devanagari-akhn-jnya-before.png right-arrow.png devanagari-akhn-jnya-after.png -geometry +0+0 -background transparent devanagari-akhn-jnya.png\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,25cc,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc\n\nmontage devanagari-rphf-before.png right-arrow.png devanagari-rphf-after.png -geometry +0+0 -background transparent devanagari-rphf.png\n\n\n## 3.5 `rkrf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-before.png --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-after.png --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,094d,0930\n\nmontage devanagari-rkrf-before.png right-arrow.png devanagari-rkrf-after.png -geometry +0+0 -background transparent devanagari-rkrf.png\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930\n\nmontage devanagari-blwf-before.png right-arrow.png devanagari-blwf-after.png -geometry +0+0 -background transparent devanagari-blwf.png\n\n\n## 3.9 `half`\n\n### Half form\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,25cc,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,094d\n\nmontage devanagari-half-before.png right-arrow.png devanagari-half-after.png -geometry +0+0 -background transparent devanagari-half.png\n\n### Eyelash Ra\n\n> Note that Noto Devanagari eyelash-Ra substitution does not appear to\n> work when using `U+25cc` dotted circle as the \"base consonant\"\n> substitute. Hence, a real consonant glyph is used instead. But it is\n> important that \"Ra\" _not_ be used as the \"base consonant\", as this\n> triggers \"Rakaar\".\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932\n\nmontage devanagari-eyelash-ra-before.png right-arrow.png devanagari-eyelash-ra-after.png -geometry +0+0 -background transparent devanagari-eyelash-ra.png\n\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,094d,0930\n\nmontage devanagari-vatu-before.png right-arrow.png devanagari-vatu-after.png -geometry +0+0 -background transparent devanagari-vatu.png\n\n\n## 3.12 `cjct`\n\n> Note: Noto Serif Devanagari implements this as `pres` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,25cc,094d,0922\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,094d,0922\n\nmontage devanagari-cjct-before.png right-arrow.png devanagari-cjct-after.png -geometry +0+0 -background transparent devanagari-cjct.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=093f,091e,094d,200c,091e,094d,0939,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091e,094d,200c,091e,094d,0939,094d,0930,093f\n\nmontage devanagari-matra-position-before.png right-arrow.png devanagari-matra-position-after.png -geometry +0+0 -background transparent devanagari-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,092f,094d,0932,094d,092e,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,092f,094d,0932,094d,092e,094d,0930\n\nmontage devanagari-reph-position-before.png right-arrow.png devanagari-reph-position-after.png -geometry +0+0 -background transparent devanagari-reph-position.png\n\n\n## 5 `init`\n\n> Note: Noto Devanagari and Murty don't implement `init`.\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f\n\nmontage devanagari-pres-before.png right-arrow.png devanagari-pres-after.png -geometry +0+0 -background transparent devanagari-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949\n\nmontage devanagari-abvs-before.png right-arrow.png devanagari-abvs-after.png -geometry +0+0 -background transparent devanagari-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944\n\nmontage devanagari-blws-before.png right-arrow.png devanagari-blws-after.png -geometry +0+0 -background transparent devanagari-blws.png\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-before.png --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-after.png --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940\n\nmontage devanagari-psts-before.png right-arrow.png devanagari-psts-after.png -geometry +0+0 -background transparent devanagari-psts.png\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,095c,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,095c,094d\n\nmontage devanagari-haln-before.png right-arrow.png devanagari-haln-after.png -geometry +0+0 -background transparent devanagari-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948\n\nmontage devanagari-abvm-before.png right-arrow.png devanagari-abvm-after.png -geometry +0+0 -background transparent devanagari-abvm.png\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943\n\nmontage devanagari-blwm-before.png right-arrow.png devanagari-blwm-after.png -geometry +0+0 -background transparent devanagari-blwm.png\n"
  },
  {
    "path": "images/devanagari/devanagari-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-devanagari.md](../../opentype-shaping-devanagari.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 3.1 `locl`\n\n> Note: Noto Devanagari has 'NEP' and 'MAR' locl features. \n\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,25cc,093c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c\n\nsvg_stack --direction=h devanagari-nukt-before.svg right-arrow.svg devanagari-nukt-after.svg > devanagari-nukt.svg\n\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,25cc,094d,0937\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,094d,0937\n\nsvg_stack --direction=h devanagari-akhn-kssa-before.svg right-arrow.svg devanagari-akhn-kssa-after.svg > devanagari-akhn-kssa.svg\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,25cc,094d,091e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,094d,091e\n\nsvg_stack --direction=h devanagari-akhn-jnya-before.svg right-arrow.svg devanagari-akhn-jnya-after.svg > devanagari-akhn-jnya.svg\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,25cc,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc\n\nsvg_stack --direction=h devanagari-rphf-before.svg right-arrow.svg devanagari-rphf-after.svg > devanagari-rphf.svg\n\n\n## 3.5 `rkrf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-before.svg --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-after.svg --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,094d,0930\n\nsvg_stack --direction=h devanagari-rkrf-before.svg right-arrow.svg devanagari-rkrf-after.svg > devanagari-rkrf.svg\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930\n\nsvg_stack --direction=h devanagari-blwf-before.svg right-arrow.svg devanagari-blwf-after.svg > devanagari-blwf.svg\n\n\n## 3.9 `half`\n\n### Half form\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,25cc,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,094d\n\nsvg_stack --direction=h devanagari-half-before.svg right-arrow.svg devanagari-half-after.svg > devanagari-half.svg\n\n### Eyelash Ra\n\n> Note that Noto Devanagari eyelash-Ra substitution does not appear to\n> work when using `U+25cc` dotted circle as the \"base consonant\"\n> substitute. Hence, a real consonant glyph is used instead. But it is\n> important that \"Ra\" _not_ be used as the \"base consonant\", as this\n> triggers \"Rakaar\".\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932\n\nsvg_stack --direction=h devanagari-eyelash-ra-before.svg right-arrow.svg devanagari-eyelash-ra-after.svg > devanagari-eyelash-ra.svg\n\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,25cc,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,094d,0930\n\nsvg_stack --direction=h devanagari-vatu-before.svg right-arrow.svg devanagari-vatu-after.svg > devanagari-vatu.svg\n\n\n## 3.12 `cjct`\n\n> Note: Noto Serif Devanagari implements this as `pres` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,25cc,094d,0922\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,094d,0922\n\nsvg_stack --direction=h devanagari-cjct-before.svg right-arrow.svg devanagari-cjct-after.svg > devanagari-cjct.svg\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=093f,091e,094d,200c,091e,094d,0939,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-after.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091e,094d,200c,091e,094d,0939,094d,0930,093f\n\nsvg_stack --direction=h devanagari-matra-position-before.svg right-arrow.svg devanagari-matra-position-after.svg > devanagari-matra-position.svg\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,092f,094d,0932,094d,092e,094d,0930\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,092f,094d,0932,094d,092e,094d,0930\n\nsvg_stack --direction=h devanagari-reph-position-before.svg right-arrow.svg devanagari-reph-position-after.svg > devanagari-reph-position.svg\n\n\n## 5 `init`\n\n> Note: Noto Devanagari and Murty don't implement `init`.\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f\n\nsvg_stack --direction=h devanagari-pres-before.svg right-arrow.svg devanagari-pres-after.svg > devanagari-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949\n\nsvg_stack --direction=h devanagari-abvs-before.svg right-arrow.svg devanagari-abvs-after.svg > devanagari-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944\n\nsvg_stack --direction=h devanagari-blws-before.svg right-arrow.svg devanagari-blws-after.svg > devanagari-blws.svg\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-before.svg --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-after.svg --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940\n\nsvg_stack --direction=h devanagari-psts-before.svg right-arrow.svg devanagari-psts-after.svg > devanagari-psts.svg\n\n\n## 5 `haln`\n\n# look at 0926,093c,094d in serif???\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-before.svg --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansDevanagari-Regular.ttf --unicodes=25cc,095d,094d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansDevanagari-Regular.ttf --unicodes=25cc,095d,094d\n\nsvg_stack --direction=h devanagari-haln-before.svg right-arrow.svg devanagari-haln-after.svg > devanagari-haln.svg\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948\n\nsvg_stack --direction=h devanagari-abvm-before.svg right-arrow.svg devanagari-abvm-after.svg > devanagari-abvm.svg\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943\n\nsvg_stack --direction=h devanagari-blwm-before.svg right-arrow.svg devanagari-blwm-after.svg > devanagari-blwm.svg\n"
  },
  {
    "path": "images/emoji/emoji-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-emoji.md](../../opentype-shaping-emoji.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## Invisibles general\n\n> Two options for VS15 and VS16 are included.\n>\n> Adobe Source Emoji includes non-color glyphs for both codepoints that\n> offer a degree of visual communication to prevent confusion in the \n> sequence illustrations, but they are not immediately associated with\n> the codepoint itself, like Gentium Plus's are.\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=text-pres-selector.png --font-funcs=ot --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=fe0e\n\nhb-view --font-size=110 --margin=2,16,2,16 --features=\"ss06\" --shapers=ot --preserve-default-ignorables --output-file=vs15.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=fe0e\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=emoji-pres-selector.png --font-funcs=ot --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=fe0f\n\nhb-view --font-size=110 --margin=2,16,2,16 --features=\"ss06\" --shapers=ot --preserve-default-ignorables --output-file=vs16.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=fe0f\n\n\n> Gentium Plus's ZWJ and ZWNJ are visually preferrable:\nhb-view --font-size=110 --margin=2,16,2,16 --features=\"ss06\" --shapers=ot --preserve-default-ignorables --output-file=zwj.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --features=\"ss06\" --shapers=ot --preserve-default-ignorables --output-file=zwnj.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=200c\n\n\n## Human beings general\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-boy.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f466\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-girl.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f467\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-man.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-woman.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-generalperson.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9d1\n\n\n## Presentation sequences\n\n> Codepoints used are based on defaults in https://www.unicode.org/emoji/charts-14.0/text-style.html\n\n> Invisibles:\nmontage vs15.png text-pres-selector.png -geometry +0+0 -background transparent -tile 2x1 text-presentation.png\n\nmontage vs16.png emoji-pres-selector.png -geometry +0+0 -background transparent -tile 2x1 emoji-presentation.png\n\n\n> Default text-presentation: U+26A0, warning arrow:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-text-before.png --background=FFFFFF00 ./Noto\\ Emoji\\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=26a0\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-text-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=26a0\n\nmontage default-text-before.png emoji-pres-selector.png right-arrow.png default-text-after.png -geometry +0+0 -background transparent -tile 4x1 emoji-pres-sequence.png\n\n\n> Default emoji-presentation: U+231B, hourglass:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-emoji-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=231b\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-emoji-after.png --background=FFFFFF00 ./Noto\\ Emoji\\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=231b\n\nmontage default-emoji-before.png text-pres-selector.png right-arrow.png default-emoji-after.png -geometry +0+0 -background transparent -tile 4x1 text-pres-sequence.png\n\n\n\n## Modifier sequences\n\n> Adobe Source Emoji for invisibles:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-2.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fb\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-3.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fc\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-4.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fd\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-5.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fe\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-6.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3ff\n\n\n> NotoColorEmoji squares for fallback skin-tone-squares:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-2.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fb\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-3.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fc\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-4.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fd\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-5.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fe\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-6.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3ff\n\n\n> Symbola for text-mode fallback squares:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-2-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fb\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-3-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fc\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-4-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fd\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-5-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fe\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-6-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3ff\n\n\n> Fitzpatrick scale:\nmontage skintone-2.png skintone-fallback-2.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-2.png\n\nmontage skintone-3.png skintone-fallback-3.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-3.png\n\nmontage skintone-4.png skintone-fallback-4.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-4.png\n\nmontage skintone-5.png skintone-fallback-5.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-5.png\n\nmontage skintone-6.png skintone-fallback-6.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-6.png\n\n\n> Fitzpatrick scale text fallback:\n> (currently unused)\nmontage skintone-2.png skintone-2-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-2-text-fallback.png\n\nmontage skintone-3.png skintone-3-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-3-text-fallback.png\n\nmontage skintone-4.png skintone-4-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-4-text-fallback.png\n\nontage skintone-5.png skintone-5-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-5-text-fallback.png\n\nmontage skintone-6.png skintone-6-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-6-text-fallback.png\n\n\n> Sequence:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=modifier-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f44b\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=modifier-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f44b,1f3fd\n\nmontage modifier-before.png skintone-4.png right-arrow.png modifier-after.png -geometry +0+0 -background transparent -tile 4x1 modifier-sequence.png\n\nmontage modifier-before.png skintone-4.png right-arrow.png modifier-before.png skintone-fallback-4.png -geometry +0+0 -background transparent -tile 5x1 modifier-sequence-fallback.png\n\n\n## Regional Indicator flag sequences\n\n> UN chosen for maximum acheivable internationality\n\n> Sequence\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=regional-flag-un-before.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f1fa,1f1f3\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=regional-flag-un-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f1fa,1f1f3\n\nmontage regional-flag-un-before.png right-arrow.png regional-flag-un-after.png -geometry +0+0 -background transparent regional-indicator-flag-sequence-un.png\n\n> Fallbacks\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=regional-flag-un-fallback.png --background=FFFFFF00 ./Noto\\ Emoji\\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=1f1fa,1f1f3\n\nmontage regional-flag-un-before.png right-arrow.png regional-flag-un-fallback.png -geometry +0+0 -background transparent regional-indicator-flag-sequence-un-fallback.png\n\n\n\n## Tag flag sequences\n\n> From https://unicode.org/reports/tr51/#valid-emoji-tag-sequences\n>\n> Wales chosen to be most distinctive example KNOWN to be widely implemented\n\n\n> Tag pseudo-glyphs\n\n> LastResort. This crops in, non-precisely, on the \"tag\" symbol itself:\nhb-view --font-size=110 --margin=-42,-28,-44,-28 --preserve-default-ignorables --output-file=tag-isolate.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f\n\n> Margin-adjusted versions:\nhb-view --font-size=110 --margin=-20,-28,-48,-28 --preserve-default-ignorables --output-file=tag-isolate-high.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f\n\nhb-view --font-size=110 --margin=-44,-28,-24,-28 --preserve-default-ignorables --output-file=tag-isolate-low.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f\n\n\n> DejaVu empty dotted square, U+2B1A:\nhb-view --font-size=110 --margin=-16,2,2,2 --preserve-default-ignorables --output-file=dotted-square.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=2b1a\n\n\n> Letters and \"end\" components:\nhb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=g-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0067\n\nhb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=b-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0062\n\nhb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=w-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0077\n\nhb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=l-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=006c\n\nhb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=s-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0073\n\nhb-view --font-size=32 --margin=2,2,16,2 --preserve-default-ignorables --output-file=end-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0045,004e,0044\n\n\n> Composite tags:\n\ncomposite -gravity north tag-isolate-high.png dotted-square.png blank-tag-high.png\n\ncomposite -gravity south tag-isolate-low.png dotted-square.png blank-tag-low.png\n\ncomposite -gravity north g-isolate.png blank-tag-low.png tag-g.png\n\ncomposite -gravity north b-isolate.png blank-tag-low.png tag-b.png\n\ncomposite -gravity north w-isolate.png blank-tag-low.png tag-w.png\n\ncomposite -gravity north l-isolate.png blank-tag-low.png tag-l.png\n\ncomposite -gravity north s-isolate.png blank-tag-low.png tag-s.png\n\ncomposite -gravity south end-isolate.png blank-tag-high.png tag-end.png\n\n\n> Completed sequence:\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=tag-flag-black.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3f4\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=tag-flag-wales-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3f4,e0067,e0062,e0077,e006c,e0073,e007f\n\nmontage tag-flag-black.png tag-g.png tag-b.png tag-w.png tag-l.png tag-s.png tag-end.png -geometry +0+0 -background transparent -tile 7x1 tag-flag-wales-before.png\n\nmontage tag-flag-wales-before.png right-arrow.png tag-flag-wales-after.png -geometry +0+0 -background transparent tag-flag-sequence-wales.png\n\n\n\n## Keycap sequences\n\n> Noto Sans Symbols has a visual CEK:\nhb-view --font-size=110 --margin=2,64,2,64 --preserve-default-ignorables --output-file=keycap-cek.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSymbols-Regular.ttf --unicodes=20e3\n\n\n> Sequence:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=keycap-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerif-Regular.ttf --unicodes=0034\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=keycap-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=0034,20e3\n\n> Something off here; the -gravity and -geometry switches are *not* working as expected....\nmontage keycap-before.png keycap-cek.png right-arrow.png keycap-after.png -geometry +0+0 -background transparent -tile 4x1 keycap-sequence.png\n\n\n\n## ZWJ sequences\n\n### ZWJ hair sequences\n\n> Hairstyles:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-red.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b0\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-curly.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b1\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-bald.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b2\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-white.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b3\n\n\n> Sequence:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=woman-white-hair.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F469,200d,1f9b3\n\nmontage fallback-woman.png zwj.png hair-white.png right-arrow.png woman-white-hair.png -geometry +0+0 -background transparent -tile 5x1 hairstyle-sequence.png\n\n\n\n### ZWJ gendered person sequences\n\n> Gender signs, input (text-presentation style):\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-female.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2640\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-male.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2642\n\n> Gender signs, output fallback (emoji-presentation style):\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-female-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2640\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-male-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2642\n\n\n> Sequence:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendered-person-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F3c4\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendered-person-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F3c4,200d,2640,fe0f\n\nmontage gendered-person-before.png zwj.png gendersign-female.png emoji-pres-selector.png right-arrow.png gendered-person-after.png -geometry +0+0 -background transparent -tile 6x1 gendered-person-sequence.png\n\n\n### ZWJ multi-person group sequences\n\n>\n> Couple with heart:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=heart.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2764\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-man-heart-man-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,2764,fe0f,200d,1f468\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-man-skintone-2-heart-man-skintone-2-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,1f3fb,200d,2764,fe0f,200d,1f468,1f3fb\n\n> Sequence:\nmontage fallback-man.png zwj.png heart.png emoji-pres-selector.png zwj.png fallback-man.png right-arrow.png multi-person-man-heart-man-after.png -geometry +0+0 -background transparent -tile 8x1 multi-person-heart-sequence.png\n\nmontage fallback-man.png skintone-2.png zwj.png heart.png emoji-pres-selector.png zwj.png fallback-man.png skintone-2.png right-arrow.png multi-person-man-skintone-2-heart-man-skintone-2-after.png -geometry +0+0 -background transparent -tile 10x1 multi-person-heart-skintone-sequence.png\n\n\n>\n> Couple kiss:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=kiss-mark.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f48b\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-kiss-sequence-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,2764,fe0f,200d,1f48b,200d,1f468\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-kiss-sequence-skintone-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3ff,200d,2764,fe0f,200d,1f48b,200d,1f468,1f3fd\n\n> Sequence:\nmontage fallback-woman.png zwj.png heart.png emoji-pres-selector.png zwj.png kiss-mark.png zwj.png fallback-man.png right-arrow.png multi-person-kiss-sequence-after.png -geometry +0+0 -background transparent -tile 10x1 multi-person-kiss-sequence.png\n\nmontage fallback-woman.png skintone-6.png zwj.png heart.png emoji-pres-selector.png zwj.png kiss-mark.png zwj.png fallback-man.png skintone-4.png right-arrow.png multi-person-kiss-sequence-skintone-after.png -geometry +0+0 -background transparent -tile 12x1 multi-person-kiss-skintone-sequence.png\n\n\n>\n> Couple holding hands:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=handshake.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f91d\n\n> Noto Color Emoji does not support this sequence, perhaps because it expects\n> fallback to `U+1F46D` to occur, e.g. at the keyboard/UI level....\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-holding-hands-sequence-woman-woman-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f46d\n\n> This modifier sequence IS supported in Noto Color Emoji\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-holding-hands-sequence-woman-woman-skintone-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3fc,200d,1f91d,200d,1f469,1f3fe\n\n> Sequence:\nmontage fallback-woman.png zwj.png handshake.png zwj.png fallback-woman.png right-arrow.png multi-person-holding-hands-sequence-woman-woman-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-holding-hands-sequence.png\n\nmontage fallback-woman.png skintone-3.png zwj.png handshake.png zwj.png fallback-woman.png skintone-5.png right-arrow.png multi-person-holding-hands-sequence-woman-woman-skintone-after.png -geometry +0+0 -background transparent -tile 9x1 multi-person-holding-hands-skintone-sequence.png\n\n\n>\n> Family:\n>\n> Noto Color Emoji supports \"Man,ZWJ,Woman,ZWJ,_child_\" but not \"Woman,ZWJ,Man,ZWJ,_child_\".\n>\n> Noto Color Emoji supports \"Woman,ZWJ,Woman,ZWJ,Girl,ZWJ,Boy\" but not \"Woman,ZWJ,Woman,ZWJ,Boy,ZWJ,Girl\".\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-boy-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f466\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-girl-girl-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f467,200d,1f467\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-woman-girl-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f469,200d,1f467\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-woman-woman-girl-boy-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,1f469,200d,1f467,200d,1f466\n\n> Sequence:\nmontage fallback-man.png zwj.png fallback-boy.png right-arrow.png multi-person-family-man-boy-after.png -geometry +0+0 -background transparent -tile 5x1 multi-person-family-man-boy-sequence.png\n\nmontage fallback-man.png zwj.png fallback-girl.png zwj.png fallback-girl.png right-arrow.png multi-person-family-man-girl-girl-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-family-man-girl-girl-sequence.png\n\nmontage fallback-man.png zwj.png fallback-woman.png zwj.png fallback-girl.png right-arrow.png multi-person-family-man-woman-girl-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-family-man-woman-girl-sequence.png\n\nmontage fallback-woman.png zwj.png fallback-woman.png zwj.png fallback-girl.png zwj.png fallback-boy.png right-arrow.png multi-person-family-woman-woman-girl-boy-after.png -geometry +0+0 -background transparent -tile 9x1 multi-person-family-woman-woman-girl-boy-sequence.png\n\n>\n> Noto Color Emoji does not seem to support skin-tone modifiers for family sequences,\n> at least in the current release on my system.\n>\n>\n\n\n>\n> Shaking hands:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hand-left.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf1\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hand-right.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf2\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-shaking-hands-after.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf1,1f3fd,200d,1faf2,1f3ff\n\n> Sequence:\nmontage hand-left.png skintone-4.png zwj.png hand-right.png skintone-6.png right-arrow.png multi-person-handshake-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-shaking-hands-sequence.png\n\n\n\n### ZWJ role sequences\n\n> Firefighter (emoji-pres-by-default):\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=firetruck.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F692\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-firefighter-man-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1F692\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-firefighter-man-skintone-6-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,1f3ff,200d,1F692\n\nmontage fallback-man.png zwj.png firetruck.png right-arrow.png role-firefighter-man-after.png -geometry +0+0 -background transparent -tile 5x1 role-sequence-firefighter.png\n\nmontage fallback-man.png skintone-6.png zwj.png firetruck.png right-arrow.png role-firefighter-man-skintone-6-after.png -geometry +0+0 -background transparent -tile 6x1 role-sequence-firefighter-skintone-6.png\n\n\n\n> Pilot (non-emoji-pres-by-default)\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=airplane.png --background=FFFFFF00 ./Noto\\ Emoji\\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=2708,fe0e\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-pilot-woman-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,2708,fe0f\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-pilot-woman-skintone-2-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3fb,200d,2708,fe0f\n\nmontage fallback-woman.png zwj.png airplane.png emoji-pres-selector.png right-arrow.png role-pilot-woman-after.png -geometry +0+0 -background transparent -tile 6x1 role-sequence-pilot.png\n\nmontage fallback-woman.png skintone-2.png zwj.png airplane.png emoji-pres-selector.png right-arrow.png role-pilot-woman-skintone-2-after.png -geometry +0+0 -background transparent -tile 7x1 role-sequence-pilot-skintone-2.png\n\n\n\n### ZWJ color sequences\n\n> Noto Color Emoji, \"black cat\" seems to be the only widely-implemented sequence:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F408\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F408,200d,2b1b\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-black.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2b1b\n\nmontage color-before.png zwj.png color-black.png right-arrow.png color-after.png -geometry +0+0 -background transparent -tile 5x1 color-sequence.png\n\n\n### ZWJ directionality sequences\n\n> These are not widely implemented, and possible not implemented at all in open-source fonts.\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=running.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3c3\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=direction-rightward.png --background=FFFFFF00 ./Noto\\ Emoji\\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=27a1\n\nconvert -flop running.png running-rightward.png\n\nmontage running.png zwj.png direction-rightward.png emoji-pres-selector.png right-arrow.png running-rightward.png -geometry +0+0 -background transparent -tile 6x1 zwj-directionality-sequence.png\n\n\n\n### ZWJ additional sequences\n\n> 13 on the named list. Heart-on-fire is clearly not just an overlay, and not a flag:\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fire.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f525\n\nhb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=heart-on-fire-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2764,fe0f,200d,1f525\n\nmontage heart.png emoji-pres-selector.png zwj.png fire.png right-arrow.png heart-on-fire-after.png -geometry +0+0 -background transparent -tile 6x1 zwj-sequence-heart-on-fire.png\n\n\n## Other sequences and ligatures\n\n> For non-standard ligatures (banana split?)\n\n> Adobe SourceEmoji has a \"hand + zwj + heart -> i-love-you-hand\" ligature:\n\n\n>Other possiblities:\n> https://blog.emojipedia.org/emoji-flags-explained/ (discusses flag options for vendors that don't require Unicode pre-/formal-approval)\n> Noto monochromatic emoji details: https://blog.emojipedia.org/exploring-googles-new-black-and-blobby-emoji-font/\n> Are Noto's \"Emoji Kitchen\" emoji (beginning with magic wand) just ligatures, or all they all \"stickers\"? https://blog.emojipedia.org/emoji-kitchen-beta-magics-back-the-blobs/\n"
  },
  {
    "path": "images/example-fonts.txt",
    "content": "#############################################################\n#                                                           #\n# All uncommented lines in this file are URLs which should  #\n# be downloadable via wget, cURL, or scripts.               #\n#                                                           #\n# SHA checksums are for the exact file downloaded at the    #\n# URL (including archive files), not for the .ttf/.otf/.ttc #\n# within.                                                   #\n#                                                           #\n# If any CDN links change or URLs break, please open an     #\n# issue in this project's bug tracker, not the font's.      #\n#                                                           #\n#############################################################\n\n##########################\n# General\n#\n\n# GentiumPlus-R.ttf\n# sha256: 5244209b44a5111736379686119cd54042dce18e308a351c366999ac563ca6bb\nhttps://software.sil.org/downloads/r/gentium/GentiumPlus-6.101.zip\n\n\n##########################\n# Arabic-like\n#\n\n# Arabic\n\n# NotoNaskhArabic-Regular.ttf\n# sha256: c1f3654dd9142073b00289700ce0aa5218c1aa4d5be38a3c5f7f2649bee12c1f\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoNaskhArabic/unhinted/ttf/NotoNaskhArabic-Regular.ttf\n\n# NotoNastaliqUrdu-Regular.ttf\n# sha256: beee3156a724adf64178c2dbac86f5394b6a4bb67aea75d4354c817c9e6c27da\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoNastaliqUrdu/hinted/ttf/NotoNastaliqUrdu-Regular.ttf\n\n# NotoKufiArabic-Regular.ttf\n# sha256: efb00432829e570a12a5afd43dd4667950ee7cf4dbc9f4c421b2f27b89dee301\nhttps://fonts.google.com/download?family=Noto%20Kufi%20Arabic\n\n\n# Mongolian\n\n# NotoSansMongolian-Regular.ttf\n# sha256: 31a750c5b7e335ebb3d841b2f97baf045178db23fe199eb3040cb3496470daa7\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMongolian/unhinted/ttf/NotoSansMongolian-Regular.ttf\n\n# N'Ko\n\n# NotoSansNKo-Regular.ttf\n# sha256: 4e9de46bfa60bf800fe4608c5ec434602c496b924b6e5333dc14515b0883dd86\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansNKo/full/ttf/NotoSansNKo-Regular.ttf\n\n# Syriac\n\n# NotoSansSyriacEastern-Regular.ttf\n# sha256: 3310d8ec31633d516b09a676b8fa6c784e2d263aeea52051ca40c60347e2e9cd\nhttps://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacEastern-unhinted.zip\n\n# NotoSansSyriacWestern-Regular.ttf\n# sha256: e8c6c6ae30ae6fb414e59112ebd08ab8341733b39b756a918e3c324249cdf5b5\nhttps://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacWestern-unhinted.zip\n\n# NotoSansSyriacEstrangela-Regular.ttf\n# sha256: 0cb823a1d55ca97bda55ae1f31b03ef762b1faa3447700142f83c9dc8f7828e4\nhttps://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacEstrangela-unhinted.zip\n\n\n##########################\n# Indic\n#\n\n# Bengali\n\n# NotoSerifBengali-Regular.ttf\n# sha256: 6f046ad71ff7f3ba154e7087a02a1f71fff782e85ab00fdd08cd28ad3e06e24b\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifBengali/full/ttf/NotoSerifBengali-Regular.ttf\n\n# Devanagari\n\n# NotoSerifDevanagari-Regular.ttf\n# sha256: 55d2e062fac9208412a15e79a5bd3753af650a4ed47bb2eea4b105591fba4a8f\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifDevanagari/full/ttf/NotoSerifDevanagari-Regular.ttf\n\n# Gujarati\n\n# NotoSerifGujarati-Regular.ttf\n# sha256: 041827ca7d1b58393587d5c974db1adf9409293ac13f23cef1a02b8700a4c6c3\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifGujarati/full/ttf/NotoSerifGujarati-Regular.ttf\n\n# Gurmukhi\n\n# NotoSansGurmukhi-Regular.ttf\n# sha256: 255f404f61622ef03385a2851c2423de3a676d978115b2e39ffd5570e0022c32\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifGurmukhi/full/ttf/NotoSerifGurmukhi-Regular.ttf\n\n# Kannada\n\n# NotoSerifKannada-Regular.ttf\n# sha256: c9e7d3168a0134f68456607521b66e3208586d07cc144618791f3492a0e88bfb\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifKannada/full/ttf/NotoSerifKannada-Regular.ttf\n\n# Malayalam\n\n# NotoSerifMalayalam-Regular.ttf\n# sha256: 7755029171b99ef7d5e7027c6f9148bcca3f3b95a41fd73cfc79d664a935cf30\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifMalayalam/full/ttf/NotoSerifMalayalam-Regular.ttf\n\n# NotoSansMalayalam-Regular.ttf \n# sha256: d18b5c10d85bba3d3d89775484bcfa731112f501ac070793ddbda5a36992520f\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMalayalam/full/ttf/NotoSansMalayalam-Regular.ttf\n\n# Rachana-Regular.ttf\n# sha256: a0d5c8417b58f98fe387758bb5c1a0e75225bb77759640bd7131c6c78dce16f1\nhttps://gitlab.com/rit-fonts/RIT-Rachana/-/jobs/artifacts/1.2/download?job=build-tag\n\n# Oriya\n\n# NotoSansOriya-Regular.ttf\n# sha256: e87f6a6c611c53dabb708a7ddfafa527b3ca7d0ccc5d2e0e659f46af19cb320f\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansOriya/full/ttf/NotoSansOriya-Regular.ttf\n\n# Sinhala\n\n# NotoSerifSinhala-Regular.ttf \n# sha256: cafa8544ad87a1116d296193cfbf8be39b7927e67fd4af36b7e73105cf2ac85e\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifSinhala/full/ttf/NotoSerifSinhala-Regular.ttf\n\n# Tamil\n\n# NotoSerifTamil-Regular.ttf\n# sha256: 00611c2dc5e6a09a5cd85eb87e135ae86aaafa5ac63543adf5eb2050784582a0\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTamil/full/ttf/NotoSerifTamil-Regular.ttf\n\n# NotoSansTamil-Regular.ttf\n# sha256: 0afbc221964b6048c6d771c525be474d21b288a621dce0fafedd695cc5c98e4e\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansTamil/full/ttf/NotoSansTamil-Regular.ttf\n\n# Telugu\n\n# NotoSerifTelugu-Regular.ttf\n# sha256: 15cec4e867d25105c484301af0b4be577231e04e13364abdff844a4a5c9711dc\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTelugu/full/ttf/NotoSerifTelugu-Regular.ttf\n\n# NotoSansTelugu-Regular.ttf\n# sha256: e0595bcf47b907b2afb77a34ae64c3e8351f56452c66983660172c6b9ea15576\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansTelugu/full/ttf/NotoSansTelugu-Regular.ttf\n\n\n##########################\n# Brahmi-derived\n#\n\n# Khmer\n\n# NotoSerifKhmer-Regular.ttf\n# sha256: 1068ef26dadf6bd322f6bcef1990015ca3c998064d20842d67aaefc4b62d9cdb\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifKhmer/full/ttf/NotoSerifKhmer-Regular.ttf\n\n# Myanmar\n\n# NotoSansMyanmar-Regular.ttf\n# sha256: e6d59055e5e7a8cc57ad8e04150136f4fc48bc3fca6f307c5e78f40f7e560a6d\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMyanmar/full/ttf/NotoSansMyanmar-Regular.ttf\n\n# PadaukBook-Regular.ttf\n# sha256: b47b2639489d7cec5ad38d025f181b061767e4e161a41f19528e910f79fd03a1\nhttps://software.sil.org/downloads/r/padauk/padauk-3.003.zip\n\n# Thai and Lao\n\n# NotoSerifThai-Regular.ttf\n# sha256: 428afb46af2c025ed2b9fe39bda2fffce9475fa5d2e7ae7911771633014b91b0\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifThai/full/ttf/NotoSerifThai-Regular.ttf\n\n# NotoSerifLao-Regular.ttf\n# sha256: ff5ab4f3270c448b99b113a8ac6275bc6e4a4ca922ed2271df9a95132b5c2db6\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifLao/full/ttf/NotoSerifLao-Regular.ttf\n\n# Tibetan\n\n# NotoSerifTibetan-Regular.ttf -- RENAMED FROM NotoSansTibetan-Regular.ttf\n# sha256: 2ac2555a88b5bcbbacc490003e96dd4d00d064daeee4a9465d68cf301f9886b3\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTibetan/full/ttf/NotoSerifTibetan-Regular.ttf\n\n\n##########################\n# Hangul\n#\n\n# NotoSansCJK-Regular.ttc\n# sha256: b76b0433203017ca80401b2ee0dd69350349871c4b19d504c34dbdd80541690a\nhttps://github.com/googlefonts/noto-cjk/archive/refs/tags/NotoSansV2.001.zip\n\n\n##########################\n# Hebrew\n#\n\n# NotoSansHebrew-Regular.ttf\n# sha256: 6d925ace0a6ccce47b64e4a8d26869f423774d66dd6b9f67cf98441075e69582\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansHebrew/full/ttf/NotoSansHebrew-Regular.ttf\n\n\n\n##########################\n# Emoji\n#\n\n# NotoColorEmoji.ttf\n# sha256: ad8eb6a1c403c73f0ff48ce288f2101de3814d0c6483f398023630e98331eeb2\nhttps://fonts.google.com/download?family=Noto%20Color%20Emoji\n\n# NotoEmoji-Regular.ttf\n# sha256: 415dc6290378574135b64c808dc640c1df7531973290c4970c51fdeb849cb0c5\nhttps://github.com/googlefonts/noto-emoji/raw/v2020-04-08-unicode12_1/fonts/NotoEmoji-Regular.ttf\n\n# Symbola_hint.ttf\n# sha256: 856de5857be48b8e31fc078fb93a9f3bd706b2552ba57b42863622f837cd3f35\nhttps://fontlibrary.org/assets/downloads/symbola/cf81aeb303c13ce765877d31571dc5c7/symbola.zip\n\n# LastResort-Regular.ttf\n# sha256: da83a62294e74d963a10de4c3750ccf089273e3b7fc6744daef9844163ade078\nhttps://github.com/unicode-org/last-resort-font/releases/download/15.000/LastResort-Regular.ttf\n\n# DejaVuSans.ttf\n# sha256: 6aaad3365c30c4f8d2504e569527e588d33eeae66dd7045bcfeef7413820db2a\nhttp://sourceforge.net/projects/dejavu/files/dejavu/2.37/dejavu-fonts-ttf-2.37.tar.bz2\n\n# NotoSansSymbols-Regular.ttf\n# sha256: 0088617baec0e8ac47e022cc1f38695f772301c9ef6d1f24a785abbef1e05d79\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansSymbols/full/ttf/NotoSansSymbols-Regular.ttf\n\n# NotoSerif-Regular.ttf\n# sha256: 504b8ec55d003cade88fb0a7bb93254ad81fd1cb29f4818d260300dbaef5d37b\nhttps://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerif/hinted/ttf/NotoSerif-Regular.ttf\n\n# Blobmoji.ttf\n# sha256: c3db3bb85e84ea7a2674399e281e004dc181ff9038b13c03bf01d3dd8197cfc8\nhttps://github.com/C1710/blobmoji/releases/download/v14.0.1/Blobmoji.ttf\n\n# SourceEmoji-BnW.otf \n# sha256: e648822387b8860a74a478df07e1bdc6e56ee57e74e3f4b07272a8a9485c7186\nhttps://github.com/adobe-fonts/source-emoji/releases/download/1.017/SourceEmoji-BnW.otf\n\n"
  },
  {
    "path": "images/gujarati/gujarati-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-gujarati.md](../../opentype-shaping-gujarati.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ac9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ac5,25cc,0abe\n\nmontage gujarati-matra-decompose-before.png right-arrow.png gujarati-matra-decompose-after.png -geometry +0+0 -background transparent gujarati-matra-decompose.png\n\n\n## 2.7 Post-base consonants\n\n> None\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a97,25cc,0abc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a97,0abc\n\nmontage gujarati-nukt-before.png right-arrow.png gujarati-nukt-after.png -geometry +0+0 -background transparent gujarati-nukt.png\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,25cc,0acd,0ab7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0ab7\n\nmontage gujarati-akhn-kssa-before.png right-arrow.png gujarati-akhn-kssa-after.png -geometry +0+0 -background transparent gujarati-akhn-kssa.png\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,25cc,0acd,0a9e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0a9e\n\nmontage gujarati-akhn-jnya-before.png right-arrow.png gujarati-akhn-jnya-after.png -geometry +0+0 -background transparent gujarati-akhn-jnya.png\n\n\n## 3.4 `rphf`\n\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,25cc,0acd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,25cc\n\nmontage gujarati-rphf-before.png right-arrow.png gujarati-rphf-after.png -geometry +0+0 -background transparent gujarati-rphf.png\n\n\n## 3.5 `rkrf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-before.png --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-after.png --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,0acd,0ab0\n\nmontage gujarati-rkrf-before.png right-arrow.png gujarati-rkrf-after.png -geometry +0+0 -background transparent gujarati-rkrf.png\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=25cc,0acd,0ab0\n\nmontage gujarati-blwf-before.png right-arrow.png gujarati-blwf-after.png -geometry +0+0 -background transparent gujarati-blwf.png\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aad,0acd,0ab0,25cc,0acd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aad,0acd,0ab0,0acd,25cc\n\nmontage gujarati-half-before.png right-arrow.png gujarati-half-after.png -geometry +0+0 -background transparent gujarati-half.png\n\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa4,25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa4,0acd,0ab0\n\nmontage gujarati-vatu-before.png right-arrow.png gujarati-vatu-after.png -geometry +0+0 -background transparent gujarati-vatu.png\n\n\n## 3.12 `cjct`\n\n> Note that Noto Serif Gujarati implements this in `pres` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-before.png --features=-init,-pres,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,25cc,0acd,0aae\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-after.png --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,0acd,0aae\n\nmontage gujarati-cjct-before.png right-arrow.png gujarati-cjct-after.png -geometry +0+0 -background transparent gujarati-cjct.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0abf,0a9f,0acd,0a9d,0acd,0ab9,0acd,0aa4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-after.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0a9d,0acd,0ab9,0acd,0aa4,0abf\n\nmontage gujarati-matra-position-before.png right-arrow.png gujarati-matra-position-after.png -geometry +0+0 -background transparent gujarati-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,25cc,0aab,0acd,0aa8,0acd,0a9a,0ac2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aab,0acd,0aa8,0acd,0a9a,0ac2\n\nmontage gujarati-reph-position-before.png right-arrow.png gujarati-reph-position-after.png -geometry +0+0 -background transparent gujarati-reph-position.png\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9e,0acd,0a9a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9e,0acd,0a9a,25cc\n\nmontage gujarati-pres-before.png right-arrow.png gujarati-pres-after.png -geometry +0+0 -background transparent gujarati-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aa3,0abf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aa3,0abf\n\nmontage gujarati-abvs-before.png right-arrow.png gujarati-abvs-after.png -geometry +0+0 -background transparent gujarati-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa3,0ac1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa3,0ac1\n\nmontage gujarati-blws-before.png right-arrow.png gujarati-blws-after.png -geometry +0+0 -background transparent gujarati-blws.png\n\n\n## 5 `psts`\n\n> Note: Noto Serif Gujarati implements this as an `abvs` lookup for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0ab0,0abe\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0ab0,0abe\n\nmontage gujarati-psts-before.png right-arrow.png gujarati-psts-after.png -geometry +0+0 -background transparent gujarati-psts.png\n\n\n## 5 `haln`\n\n> Note: Noto Serif Gujarati implements this as a `blwm` lookup.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0a95,0abc,0acd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0a95,0abc,0acd\n\nmontage gujarati-haln-before.png right-arrow.png gujarati-haln-after.png -geometry +0+0 -background transparent gujarati-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0ab9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0ab9\n\nmontage gujarati-abvm-before.png right-arrow.png gujarati-abvm-after.png -geometry +0+0 -background transparent gujarati-abvm.png\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0aa0,0ac4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0aa0,0ac4\n\nmontage gujarati-blwm-before.png right-arrow.png gujarati-blwm-after.png -geometry +0+0 -background transparent gujarati-blwm.png\n\n\n\n"
  },
  {
    "path": "images/gujarati/gujarati-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-gujarati.md](../../opentype-shaping-gujarati.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ac9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ac5,25cc,0abe\n\nsvg_stack.py --direction=h gujarati-matra-decompose-before.svg right-arrow.svg gujarati-matra-decompose-after.svg > gujarati-matra-decompose.svg\n\n\n## 2.7 Post-base consonants\n\n> None\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a97,25cc,0abc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a97,0abc\n\nsvg_stack.py --direction=h gujarati-nukt-before.svg right-arrow.svg gujarati-nukt-after.svg > gujarati-nukt.svg\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a95,25cc,0acd,0ab7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a95,0acd,0ab7\n\nsvg_stack.py --direction=h gujarati-akhn-kssa-before.svg right-arrow.svg gujarati-akhn-kssa-after.svg > gujarati-akhn-kssa.svg\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,25cc,0acd,0a9e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0a9e\n\nsvg_stack.py --direction=h gujarati-akhn-jnya-before.svg right-arrow.svg gujarati-akhn-jnya-after.svg > gujarati-akhn-jnya.svg\n\n\n## 3.4 `rphf`\n\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,25cc,0acd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,25cc\n\nsvg_stack.py --direction=h gujarati-rphf-before.svg right-arrow.svg gujarati-rphf-after.svg > gujarati-rphf.svg\n\n\n## 3.5 `rkrf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-before.svg --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-after.svg --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,0acd,0ab0\n\nsvg_stack.py --direction=h gujarati-rkrf-before.svg right-arrow.svg gujarati-rkrf-after.svg > gujarati-rkrf.svg\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0acd,0ab0\n\nsvg_stack.py --direction=h gujarati-blwf-before.svg right-arrow.svg gujarati-blwf-after.svg > gujarati-blwf.svg\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aad,0acd,0ab0,25cc,0acd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aad,0acd,0ab0,0acd,25cc\n\nsvg_stack.py --direction=h gujarati-half-before.svg right-arrow.svg gujarati-half-after.svg > gujarati-half.svg\n\n\n## 3.11 `vatu`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa4,25cc,0acd,0ab0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa4,0acd,0ab0\n\nsvg_stack.py --direction=h gujarati-vatu-before.svg right-arrow.svg gujarati-vatu-after.svg > gujarati-vatu.svg\n\n\n## 3.12 `cjct`\n\n> Note that Noto Serif Gujarati implements this in `pres` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-before.svg --features=-init,-pres,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,25cc,0acd,0aae\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-after.svg --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,0acd,0aae\n\nsvg_stack.py --direction=h gujarati-cjct-before.svg right-arrow.svg gujarati-cjct-after.svg > gujarati-cjct.svg\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0abf,0a9b,0acd,0aad,0acd,0aa6,0acd,0aae\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-after.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0aad,0acd,0aa6,0acd,0aae,0abf\n\nsvg_stack.py --direction=h gujarati-matra-position-before.svg right-arrow.svg gujarati-matra-position-after.svg > gujarati-matra-position.svg\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,25cc,0aab,0acd,0aa8,0acd,0a9a,0ac2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aab,0acd,0aa8,0acd,0a9a,0ac2\n\nsvg_stack.py --direction=h gujarati-reph-position-before.svg right-arrow.svg gujarati-reph-position-after.svg > gujarati-reph-position.svg\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9e,0acd,0a9a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9e,0acd,0a9a,25cc\n\nsvg_stack.py --direction=h gujarati-pres-before.svg right-arrow.svg gujarati-pres-after.svg > gujarati-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aa3,0abf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aa3,0abf\n\nsvg_stack.py --direction=h gujarati-abvs-before.svg right-arrow.svg gujarati-abvs-after.svg > gujarati-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aab,0ac1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aab,0ac1\n\nsvg_stack.py --direction=h gujarati-blws-before.svg right-arrow.svg gujarati-blws-after.svg > gujarati-blws.svg\n\n\n## 5 `psts`\n\n> Note: Noto Serif Gujarati implements this as an `abvs` lookup for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0ab0,0abe\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0ab0,0abe\n\nsvg_stack.py --direction=h gujarati-psts-before.svg right-arrow.svg gujarati-psts-after.svg > gujarati-psts.svg\n\n\n## 5 `haln`\n\n> Note: Noto Serif Gujarati implements this as a `blwm` lookup in\n> addition to `haln`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-after.svg --features=-init,+haln,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa0,0acd\n\nhb-view --font-size=110 --margin=2,24,2,16 --output-file=gujarati-haln-before.svg --features=-init,-haln,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa0,0acd\n\nsvg_stack.py --direction=h gujarati-haln-before.svg right-arrow.svg gujarati-haln-after.svg > gujarati-haln.svg\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0ab9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0ab9\n\nsvg_stack.py --direction=h gujarati-abvm-before.svg right-arrow.svg gujarati-abvm-after.svg > gujarati-abvm.svg\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,48,16,16 --output-file=gujarati-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0ab0,0ae3\n\nhb-view --font-size=110 --margin=2,16,16,16 --output-file=gujarati-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0ab0,0ae3\n\nsvg_stack.py --direction=h gujarati-blwm-before.svg right-arrow.svg gujarati-blwm-after.svg > gujarati-blwm.svg\n\n## 6 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-dist-before.svg --features=-init,-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab9,0acd,0aa3,0aa6,0acd,0ab5\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-dist-after.svg --features=-init,+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab9,0acd,0aa3,0aa6,0acd,0ab5\n\nsvg_stack.py --direction=h gujarati-dist-before.svg right-arrow.svg gujarati-dist-after.svg > gujarati-dist.svg\n\n"
  },
  {
    "path": "images/gurmukhi/gurmukhi-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-gurmukhi.md](../../opentype-shaping-gurmukhi.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n> Note: There is, at present, no Noto Serif for Gurmukhi; therefore\n> (unlike the other Indic scripts) these examples use Noto Sans. Serif\n> would be preferrable if it appears in the future, though, due to the\n> increased stroke contrast.\n\n## 2.7 Post-base consonants\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-before.png --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-after.png --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a2f\n\nmontage gurmukhi-pstf-before.png right-arrow.png gurmukhi-pstf-after.png -geometry +0+0 -background transparent gurmukhi-pstf.png\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a38,25cc,0a3c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a38,0a3c\n\nmontage gurmukhi-nukt-before.png right-arrow.png gurmukhi-nukt-after.png -geometry +0+0 -background transparent gurmukhi-nukt.png\n\n\n## 3.3 `akhn`\n\n> Note: Noto Sans Gurmukhi has no `akhn` feature implemented.\n\n\n## 3.4 `rphf`\n\n> Note: Noto Sans Gurmukhi has no `rphf` feature implemented.\n\n\n## 3.7 `blwf`\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a30\n\nmontage gurmukhi-blwf-ra-before.png right-arrow.png gurmukhi-blwf-ra-after.png -geometry +0+0 -background transparent gurmukhi-blwf-ra.png\n\n### Va\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a35\n\nmontage gurmukhi-blwf-va-before.png right-arrow.png gurmukhi-blwf-va-after.png -geometry +0+0 -background transparent gurmukhi-blwf-va.png\n\n### Ha\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a39\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a39\n\nmontage gurmukhi-blwf-ha-before.png right-arrow.png gurmukhi-blwf-ha-after.png -geometry +0+0 -background transparent gurmukhi-blwf-ha.png\n\n\n## 3.9 `half`\n\n> Note: Gurmukhi fonts seem to stick to explicit halant-forms.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-before.png --features=-init,-half,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a2d,0a4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a2d,0a4d\n\nmontage gurmukhi-half-before.png right-arrow.png gurmukhi-half-after.png -geometry +0+0 -background transparent gurmukhi-half.png\n\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n\n## 3.11 `vatu`\n\n> Note: Noto Gurmukhi has no `vatu` feature.\n\n\n## 3.12 `cjct`\n\n> Note: Noto Gurmukhi has no `cjct` feature.\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a25,0a3f,0a4d,0a32,0a4d,0a35,0a4d,0a1a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a25,0a4d,0a32,0a4d,0a35,0a4d,0a1a,0a3f\n\nmontage gurmukhi-matra-position-before.png right-arrow.png gurmukhi-matra-position-after.png -geometry +0+0 -background transparent gurmukhi-matra-position.png\n\n\n## 4.3 Reph position\n\n> Note: Noto Gurmukhi has no `rphf` feature and no Reph\n> glyph. Therefore no illustration of Reph positioning is possible.\n\n\n## 5 `init`\n\n> Note: Noto Gurmukhi has no `init` feature, and it is unclear from\n> the Microsoft specification whether `init` is defined for Gurmukhi.\n\n\n## 5 `pres`\n\n> Note: Noto Gurmukhi has no `pres` feature, even though it would be\n> possible to implement one for the i-matra (`U+0A3F`).\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a13,0a71\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a13,0a71\n\nmontage gurmukhi-abvs-before.png right-arrow.png gurmukhi-abvs-after.png -geometry +0+0 -background transparent gurmukhi-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhiUI-Regular.ttf --unicodes=0a15,25cc,0a4d,0a30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhiUI-Regular.ttf --unicodes=0a15,0a4d,0a30\n\nmontage gurmukhi-blws-before.png right-arrow.png gurmukhi-blws-after.png -geometry +0+0 -background transparent gurmukhi-blws.png\n\n\n## 5 `psts`\n\n> Note: Noto Sans Gurmukhi has no `psts` feature.\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a32,0a3c,0a4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a32,0a3c,0a4d\n\nmontage gurmukhi-haln-before.png right-arrow.png gurmukhi-haln-after.png -geometry +0+0 -background transparent gurmukhi-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a20,0a48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a20,0a48\n\nmontage gurmukhi-abvm-before.png right-arrow.png gurmukhi-abvm-after.png -geometry +0+0 -background transparent gurmukhi-abvm.png\n\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a06,0a42\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a06,0a42\n\nmontage gurmukhi-blwm-before.png right-arrow.png gurmukhi-blwm-after.png -geometry +0+0 -background transparent gurmukhi-blwm.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/gurmukhi/gurmukhi-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-gurmukhi.md](../../opentype-shaping-gurmukhi.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n> Note: There is, at present, no Noto Serif for Gurmukhi; therefore\n> (unlike the other Indic scripts) these examples use Noto Sans. Serif\n> would be preferrable if it appears in the future, though, due to the\n> increased stroke contrast.\n\n## 2.7 Post-base consonants\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-before.svg --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-after.svg --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a2f\n\nsvg_stack.py --direction=h gurmukhi-pstf-before.svg right-arrow.svg gurmukhi-pstf-after.svg > gurmukhi-pstf.svg\n\n#### Duplicates for other subsections\n\ncp gurmukhi-pstf.svg gurmukhi-pstf-1.svg\n\ncluster_styles = [\n\n\n## 3.2 `nukt`\n\n> Noto Serif replaces \"Dda,Nukta\" with \"Rra\". That sequence is chosen\n> because it means the change in glyphs is easily seen, but perhaps it\n> would be better to use an example that has an \"-after\" form that is\n> more clearly nukta-bearing?\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a21,25cc,0a3c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a21,0a3c\n\nsvg_stack.py --direction=h gurmukhi-nukt-before.svg right-arrow.svg gurmukhi-nukt-after.svg > gurmukhi-nukt.svg\n\n\n## 3.3 `akhn`\n\n> Note: Noto Sans/Serif Gurmukhi have no `akhn` feature implemented.\n\n\n## 3.4 `rphf`\n\n> Note: Noto Sans/Serif Gurmukhi have no `rphf` feature implemented.\n\n\n## 3.7 `blwf`\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a30\n\nsvg_stack.py --direction=h gurmukhi-blwf-ra-before.svg right-arrow.svg gurmukhi-blwf-ra-after.svg > gurmukhi-blwf-ra.svg\n\n### Va\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a35\n\nsvg_stack.py --direction=h gurmukhi-blwf-va-before.svg right-arrow.svg gurmukhi-blwf-va-after.svg > gurmukhi-blwf-va.svg\n\n### Ha\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a39\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a39\n\nsvg_stack.py --direction=h gurmukhi-blwf-ha-before.svg right-arrow.svg gurmukhi-blwf-ha-after.svg > gurmukhi-blwf-ha.svg\n\n\n## 3.9 `half`\n\n> Note: Gurmukhi fonts seem to stick to explicit halant-forms.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-before.svg --features=-init,-half,-mark,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a23,0a4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-after.svg --features=-init,+half,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a23,0a4d\n\nsvg_stack.py --direction=h gurmukhi-half-before.svg right-arrow.svg gurmukhi-half-after.svg > gurmukhi-half.svg\n\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n\n## 3.11 `vatu`\n\n> Note: Noto Gurmukhi has no `vatu` feature.\n\n\n## 3.12 `cjct`\n\n> Note: Noto Gurmukhi has no `cjct` feature.\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,24,16 --output-file=gurmukhi-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a25,0a3f,0a4d,0a32,0a4d,0a35,0a4d,0a1a\n\nhb-view --font-size=110 --margin=2,16,24,16 --output-file=gurmukhi-matra-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a25,0a4d,0a32,0a4d,0a35,0a4d,0a1a,0a3f\n\nsvg_stack.py --direction=h gurmukhi-matra-position-before.svg right-arrow.svg gurmukhi-matra-position-after.svg > gurmukhi-matra-position.svg\n\n\n## 4.3 Reph position\n\n> Note: Noto Gurmukhi has no `rphf` feature and no Reph\n> glyph. Therefore no illustration of Reph positioning is possible.\n\n\n## 5 `init`\n\n> Note: Noto Gurmukhi has no `init` feature, and it is unclear from\n> the Microsoft specification whether `init` is defined for Gurmukhi.\n\n\n## 5 `pres`\n\n> Note: Noto Gurmukhi has no `pres` feature, even though it would be\n> possible to implement one for the i-matra (`U+0A3F`).\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=gurmukhi-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a13,0a71\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a13,0a71\n\nsvg_stack.py --direction=h gurmukhi-abvs-before.svg right-arrow.svg gurmukhi-abvs-after.svg > gurmukhi-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a15,25cc,0a4d,0a30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a15,0a4d,0a30\n\nsvg_stack.py --direction=h gurmukhi-blws-before.svg right-arrow.svg gurmukhi-blws-after.svg > gurmukhi-blws.svg\n\n\n## 5 `psts`\n\n> Note: Noto Sans Gurmukhi has no `psts` feature.\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-before.svg --features=-init,-haln,-half,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a5c,0a4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a5c,0a4d\n\nsvg_stack.py --direction=h gurmukhi-haln-before.svg right-arrow.svg gurmukhi-haln-after.svg > gurmukhi-haln.svg\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a20,0a48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a20,0a48\n\nsvg_stack.py --direction=h gurmukhi-abvm-before.svg right-arrow.svg gurmukhi-abvm-after.svg > gurmukhi-abvm.svg\n\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a17,0a51\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a17,0a51\n\nsvg_stack.py --direction=h gurmukhi-blwm-before.svg right-arrow.svg gurmukhi-blwm-after.svg > gurmukhi-blwm.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/hangul/hangul-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-hangul.md](../../opentype-shaping-hangul.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-ljmo,-vjmo,-tjmo` in examples where\n> the jamo features are not being explained. This may not be\n> neccessary in all fonts.\n\n\n## LV example\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lv-syllable.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166\n\n\n## LVT example\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lvt-syllable.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166,11ae\n\n\n## 3. Compose the syllable\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-before.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,200b,1171,200b,11b8\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-after.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,1171,11b8\n\nmontage hangul-compose-before.png right-arrow.png hangul-compose-after.png -geometry +0+0 -background transparent hangul-compose.png\n\n\n## 4. Decompose the syllable\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-before.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,1172,11af\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-after.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,200d,1172,200d,11af\n\nmontage hangul-decompose-before.png right-arrow.png hangul-decompose-after.png -geometry +0+0 -background transparent hangul-decompose.png\n\n\n## 5.2 `ljmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-before.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-after.png --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nmontage hangul-ljmo-before.png right-arrow.png hangul-ljmo-after.png -geometry +0+0 -background transparent hangul-ljmo.png\n\n\n## 5.3 `vjmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-before.png --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-after.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nmontage hangul-vjmo-before.png right-arrow.png hangul-vjmo-after.png -geometry +0+0 -background transparent hangul-vjmo.png\n\n\n## 5.4 `tjmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-before.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-after.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9\n\nmontage hangul-tjmo-before.png right-arrow.png hangul-tjmo-after.png -geometry +0+0 -background transparent hangul-tjmo.png\n\n\n## 6. Tone marks\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-before.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,200c,302f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-after.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,302f\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/hangul/hangul-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-hangul.md](../../opentype-shaping-hangul.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-ljmo,-vjmo,-tjmo` in examples where\n> the jamo features are not being explained. This may not be\n> neccessary in all fonts.\n\n\n## LV example\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lv-syllable.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166\n\n\n## LVT example\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lvt-syllable.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166,11ae\n\n\n## 3. Compose the syllable\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-before.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,200b,1171,200b,11b8\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-after.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,1171,11b8\n\nsvg_stack --direction=h hangul-compose-before.svg right-arrow.svg hangul-compose-after.svg > hangul-compose.svg\n\n\n## 4. Decompose the syllable\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-before.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,1172,11af\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-after.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=3141,200b,3160,200b,3139\n\nsvg_stack --direction=h hangul-decompose-before.svg right-arrow.svg hangul-decompose-after.svg > hangul-decompose.svg\n\n\n## 5.2 `ljmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-before.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-after.svg --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb\n\nsvg_stack --direction=h hangul-ljmo-before.svg right-arrow.svg hangul-ljmo-after.svg > hangul-ljmo.svg\n\n\n## 5.3 `vjmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-before.svg --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-after.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,200b,11bb\n\nsvg_stack --direction=h hangul-vjmo-before.svg right-arrow.svg hangul-vjmo-after.svg > hangul-vjmo.svg\n\n\n## 5.4 `tjmo`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-before.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,200b,11bb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-after.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,11bb\n\nsvg_stack --direction=h hangul-tjmo-before.svg right-arrow.svg hangul-tjmo-after.svg > hangul-tjmo.svg\n\n\n## 6. Tone marks\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-before.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,200c,302f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-after.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,302f\n\nsvg_stack.py --direction=h hangul-tone-before.svg right-arrow.svg hangul-tone-after.svg > hangul-tone.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/hebrew/hebrew-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-hebrew.md](../../opentype-shaping-hebrew.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 1.1 `ccmp`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-ccmp-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05b3,05bd\n\nhb-view --font-size=110 --margin=2,24,2,24 --output-file=hebrew-ccmp-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05b3,05bd\n\nmontage hebrew-ccmp-before.png right-arrow.png hebrew-ccmp-after.png -geometry +0+0 -background transparent hebrew-ccmp.png\n\n\n## 2 Alphabetic Presentation Forms\n\n> Note: Noto Sans Hebrew implements these compositions in a `ccmp` lookup.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,05bc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,05bc\n\nmontage hebrew-apf-before.png right-arrow.png hebrew-apf-after.png -geometry +0+0 -background transparent hebrew-apf.png\n\n\n## 4.1 `liga`\n\nhb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-liga-before.png --features=-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=fb4f,05b1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-liga-after.png --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=fb4f,05b1\n\nmontage hebrew-liga-before.png right-arrow.png hebrew-liga-after.png -geometry +0+0 -background transparent hebrew-liga.png\n\n\n## 4.2 `dlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-before.png --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05d0,05dc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-after.png --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05d0,05dc\n\nmontage hebrew-dlig-before.png right-arrow.png hebrew-dlig-after.png -geometry +0+0 -background transparent hebrew-dlig.png\n\n\n## 5.1 `kern`\n\n> Note: Noto Sans Hebrew has `kern` lookups, but so far I have not\n> been able to identify an easily visible example pair.\n\n\n## 5.2 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,059a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,059a\n\nmontage hebrew-mark-before.png right-arrow.png hebrew-mark-after.png -geometry +0+0 -background transparent hebrew-mark.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/hebrew/hebrew-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-hebrew.md](../../opentype-shaping-hebrew.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 1.1 `ccmp`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-ccmp-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d5,05c1\n\nhb-view --font-size=110 --margin=2,24,2,24 --output-file=hebrew-ccmp-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d5,05c1\n\nsvg_stack.py --direction=h hebrew-ccmp-before.svg right-arrow.svg hebrew-ccmp-after.svg > hebrew-ccmp.svg\n\n\n## 2 Alphabetic Presentation Forms\n\n> Note: Noto Sans Hebrew implements these compositions in a `ccmp`\n> lookup. Noto Serif Hebrew does not.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d9,05b4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=fb1d\n\nsvg_stack.py --direction=h hebrew-apf-before.svg right-arrow.svg hebrew-apf-after.svg > hebrew-apf.svg\n\n\n## 4.1 `liga`\n\nhb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-liga-before.svg --features=-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=25cc,05b1,200d,05bd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-liga-after.svg --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=25cc,05b1,200d,05bd\n\nsvg_stack.py --direction=h hebrew-liga-before.svg right-arrow.svg hebrew-liga-after.svg > hebrew-liga.svg\n\n\n## `calt`\n\n> The `calt` feature clearly gets applied; it's unclear why it was\n> left off of the list in the initial release of this document.\n\nhb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-calt-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05dc,05b4,05b8,05dd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-calt-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05dc,05b4,05b8,05dd\n\nsvg_stack.py --direction=h hebrew-calt-before.svg right-arrow.svg hebrew-calt-after.svg > hebrew-calt.svg\n\n\n## 4.2 `dlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-before.svg --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d0,05dc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-after.svg --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d0,05dc\n\nsvg_stack.py --direction=h hebrew-dlig-before.svg right-arrow.svg hebrew-dlig-after.svg > hebrew-dlig.svg\n\n\n## 5.1 `kern`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d2,05e2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d2,05e2\n\nsvg_stack.py --direction=h hebrew-kern-before.svg right-arrow.svg hebrew-kern-after.svg > hebrew-kern.svg\n\n\n## 5.2 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d3,05b1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d3,05b1\n\nsvg_stack.py --direction=h hebrew-mark-before.svg right-arrow.svg hebrew-mark-after.svg > hebrew-mark.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/images-index.md",
    "content": "\n# Images #\n\nThis section includes a separate subdirectory for each script,\ncontaining the images included in the relevant script-shaping document.\n\nAlso included in each directory is a log file containing the exact\ncommands used to generate the images.\n\n<abbr title=\"Portable Network Graphics\">PNG</abbr> glyph images are generated using the `hb-view` utility from\nHarfbuzz and the `montage` utility from ImageMagick. The commands were\nrun on a Linux-based system but, apart from minor differences in the\nfile-path to the font file specified, should be completely\nreproducible on other operating systems.\n\n<abbr title=\"Scalable Vector Graphics\">SVG</abbr> glyph images are generated using the `hb-view` utility from\nHarfbuzz and the [`svg_stack`](https://github.com/astraw/svg_stack/)\nPython utility. The commands were run on a Linux-based system but,\napart from minor differences in the file-path to the font file\nspecified, should be completely reproducible on other operating\nsystems.\n\nLong-term, the <abbr title=\"Portable Network Graphics\">PNG</abbr> images will be replaced by <abbr title=\"Scalable Vector Graphics\">SVG</abbr> images &mdash;\nalthough, at present, there are still some images that are generated\nin <abbr title=\"Portable Network Graphics\">PNG</abbr> form (because kinks remain to be worked out in the <abbr title=\"Scalable Vector Graphics\">SVG</abbr>-image\nalignment process and the corresponding CSS styling).\n\nThe font files used must be publicly and freely available, open-source\nfonts. By default, the Noto fonts from Google are the starting point.\n\nA list of the fonts used to generate the latest version of the images\nis provided in the [example-fonts.txt](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/example-fonts.txt) file, with\nURLs and <abbr title=\"Secure Hash Algorithm\">SHA</abbr> checksums for each file.\n\nThe image file names follow a simple, but important, pattern:\n\n    _script_-_featureillustrated_.png\n\t\nIntermediary images copy the pattern but append _-before_ or _-after_\nwhen depicting the before-or-after state of an applied OpenType\nfeature, or some other suffix as appropriate.\n\nIf you are suggesting an update to an image, please utilize the same\ncommands and general syntax. If you are suggesting adding a new image,\nplease also follow the file-name pattern. Patches to the image-generation log for\neach script are appreciated, in order to keep the log up-to-date.\n\n  - Indic\n      - [Devanagari](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/devanagari/devanagari-svg-image-generation-log.md)\n      - [Bengali](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/bengali/bengali-svg-image-generation-log.md)\n      - [Gujarati](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/gujarati/gujarati-svg-image-generation-log.md)\n      - [Gurmukhi](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/gurmukhi/gurmukhi-svg-image-generation-log.md)\n      - [Kannada](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/kannada/kannada-svg-image-generation-log.md)\n      - [Malayalam](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/malayalam/malayalam-svg-image-generation-log.md)\n      - [Oriya](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/oriya/oriya-svg-image-generation-log.md)\n      - [Tamil](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/tamil/tamil-svg-image-generation-log.md)\n      - [Telugu](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/telugu/telugu-svg-image-generation-log.md)\n      - [Sinhala](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/sinhala/sinhala-svg-image-generation-log.md)\n  - Brahmi-derived\n\t  - [Khmer](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/khmer/khmer-svg-image-generation-log.md)\n\t  - [Lao](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/thai-lao/thai-lao-svg-image-generation-log.md)\n\t  - [Myanmar](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/myanmar/myanmar-svg-image-generation-log.md)\n\t  - [Thai](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/thai-lao/thai-lao-svg-image-generation-log.md)\n\t  - [Tibetan](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/tibetan/tibetan-svg-image-generation-log.md)\n  - Arabic\n      - [Arabic](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/arabic/arabic-svg-image-generation-log.md)\n      - [Syriac](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/syriac/syriac-svg-image-generation-log.md)\n      - [N'Ko](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/nko/nko-svg-image-generation-log.md)\n      - [Mongolian](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/mongolian/mongolian-svg-image-generation-log.md)\n  - Hangul\n      - [Hangul](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/hangul/hangul-svg-image-generation-log.md)\n  - Hebrew\n      - [Hebrew](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/hebrew/hebrew-svg-image-generation-log.md)\n  - Emoji\n      - [Emoji](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/emoji/emoji-png-image-generation-log.md)\n"
  },
  {
    "path": "images/kannada/kannada-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-kannada.md](../../opentype-shaping-kannada.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cc8\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cc6,25cc,0cd6\n\nmontage kannada-matra-decomposition-before.png right-arrow.png kannada-matra-decomposition-after.png -geometry +0+0 -background transparent kannada-matra-decomposition.png\n\n\n## 3.2 `nukt`\n\n> Note: Noto Serif Kannada implements this in `blwm` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-nukt-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,25cc,0cbc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-nukt-after.png  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cbc\n\nmontage kannada-nukt-before.png right-arrow.png kannada-nukt-after.png -geometry +0+0 -background transparent kannada-nukt.png\n\n\n## 3.3 `akhn`\n\n> Note: Noto Serif Kannada implements this in both `akhn` and in\n> `blwf` for unknown reasons.\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-before.png --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-after.png --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7\n\nmontage kannada-akhn-kssa-before.png right-arrow.png kannada-akhn-kssa-after.png -geometry +0+0 -background transparent kannada-akhn-kssa.png\n\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-before.png --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c9c,0ccd,0c9e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-after.png --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c9c,0ccd,0c9e\n\nmontage kannada-akhn-jnya-before.png right-arrow.png kannada-akhn-jnya-after.png -geometry +0+0 -background transparent kannada-akhn-jnya.png\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-before.png --features=-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc\n\nmontage kannada-rphf-before.png right-arrow.png kannada-rphf-after.png -geometry +0+0 -background transparent kannada-rphf.png\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=25cc,0ccd,0ca1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=25cc,0ccd,0ca1\n\nmontage kannada-blwf-before.png right-arrow.png kannada-blwf-after.png -geometry +0+0 -background transparent kannada-blwf.png\n\n\n## 4.3 Reph positioning\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-before.png  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc,0cad,0ccd,0cb3,0cc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-after.png  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,0cad,0ccd,0cb3,0cc2\n\nmontage kannada-reph-position-before.png right-arrow.png kannada-reph-position-after.png -geometry +0+0 -background transparent kannada-reph-position.png\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb5,25cc,0cc1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb5,0cc1\n\nmontage kannada-pres-before.png right-arrow.png kannada-pres-after.png -geometry +0+0 -background transparent kannada-pres.png\n\n\n## 5 `abvs`\n\n> Note: Noto Serif Kannada has some abvs-like substituations in `pres`\n> lookup 14 (via single-sub lookup 23), but I have not yet figured out\n> whether they are,\n> linguistically speaking, actually above-base features. Thus, they are\n> included here, but might not be used in the shaping document.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0ca3,0ccc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0ca3,0ccc\n\nmontage kannada-abvs-before.png right-arrow.png kannada-abvs-after.png -geometry +0+0 -background transparent kannada-abvs.png\n\n\n## 5 `blws`\n\n> Note: Note Serif Kannada has some blws-like substitutions in \n> `pres` lookup 12 (via contextual chaining lookups 7 and 8 (via\n> single-sub lookups 19 and 20)), but I have not yet figured out\n> whether they are, linguistically speaking, actually above-base\n> features. Thus, they are included here, but might not be used in the\n> shaping document.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7,0cc1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7,0cc1\n\nmontage kannada-blws-before.png right-arrow.png kannada-blws-after.png -geometry +0+0 -background transparent kannada-blws.png\n\n## 5 `psts`\n\n> Note: Noto Serif Kannada has some psts-like lookups in `pres` lookup 12 (via single-sub lookup 21 and 22 (via contextual chaining lookup\n> 9 and 10)), but I have not yet figured out whether they are,\n> linguistically speaking, actually post-base features. Thus, they are\n> included here, but might not be used in the shaping document.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0cbe\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0cbe\n\nmontage kannada-psts-before.png right-arrow.png kannada-psts-after.png -geometry +0+0 -background transparent kannada-psts.png\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-before.png --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c98,0ccd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c98,0ccd\n\nmontage kannada-haln-before.png right-arrow.png kannada-haln-after.png -geometry +0+0 -background transparent kannada-haln.png\n\n\n## 6 `abvm`\n\n> Note: Noto Serif Kannada does not include an `abvm` feature.\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-before.png --features=-blwm,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cc1,0cbc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cc1,0cbc\n\nmontage kannada-blwm-before.png right-arrow.png kannada-blwm-after.png -geometry +0+0 -background transparent kannada-blwm.png\n"
  },
  {
    "path": "images/kannada/kannada-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-kannada.md](../../opentype-shaping-kannada.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cc8\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cc6,25cc,0cd6\n\nsvg_stack.py --direction=h kannada-matra-decomposition-before.svg right-arrow.svg kannada-matra-decomposition-after.svg > kannada-matra-decomposition.svg\n\n\n## 3.2 `nukt`\n\n> Note: Noto Serif Kannada implements this in `blwm` for unknown\n> reasons.\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-nukt-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,25cc,0cbc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-nukt-after.svg  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cbc\n\nsvg_stack.py --direction=h kannada-nukt-before.svg right-arrow.svg kannada-nukt-after.svg > kannada-nukt.svg\n\n\n## 3.3 `akhn`\n\n> Note: Noto Serif Kannada implements this in both `akhn` and in\n> `blwf` for unknown reasons.\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-before.svg --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0cb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-after.svg --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0cb7\n\nsvg_stack.py --direction=h kannada-akhn-kssa-before.svg right-arrow.svg kannada-akhn-kssa-after.svg > kannada-akhn-kssa.svg\n\n\n### JNya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-before.svg --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c9c,0ccd,0c9e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-after.svg --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c9c,0ccd,0c9e\n\nsvg_stack.py --direction=h kannada-akhn-jnya-before.svg right-arrow.svg kannada-akhn-jnya-after.svg > kannada-akhn-jnya.svg\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-before.svg --features=-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc\n\nsvg_stack.py --direction=h kannada-rphf-before.svg right-arrow.svg kannada-rphf-after.svg > kannada-rphf.svg\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ccd,0ca1\n\nhb-view --font-size=110 --margin=2,24,2,16 --output-file=kannada-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ccd,0ca1\n\nsvg_stack.py --direction=h kannada-blwf-before.svg right-arrow.svg kannada-blwf-after.svg > kannada-blwf.svg\n\n\n## 4.3 Reph positioning\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-before.svg  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc,0cad,0ccd,0cb3,0cc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-after.svg  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,0cad,0ccd,0cb3,0cc2\n\nsvg_stack.py --direction=h kannada-reph-position-before.svg right-arrow.svg kannada-reph-position-after.svg > kannada-reph-position.svg\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb5,25cc,0cc1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb5,0cc1\n\nsvg_stack.py --direction=h kannada-pres-before.svg right-arrow.svg kannada-pres-after.svg > kannada-pres.svg\n\n\n## 5 `abvs`\n\n> Note: Noto Serif Kannada has some abvs-like substituations in\n> `psts` lookups for unknown reasons. This is one.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0ca3,0ccd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0ca3,0ccd\n\nsvg_stack.py --direction=h kannada-abvs-before.svg right-arrow.svg kannada-abvs-after.svg > kannada-abvs.svg\n\n\n## 5 `blws`\n\n> Note: Note Serif Kannada has some blws-like substitutions in \n> `psts` lookups for unknown reasons. This is one.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0ca4,0ccd,0caf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0ca4,0ccd,0caf\n\nsvg_stack.py --direction=h kannada-blws-before.svg right-arrow.svg kannada-blws-after.svg > kannada-blws.svg\n\n## 5 `psts`\n\n> Note: Noto Serif Kannada has moved many `pres` lookups into `psts`\n> in the most recent release.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-before.svg --features=-pres,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ca4,0cbf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-after.svg --features=+pres,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ca4,0cbf\n\nsvg_stack.py --direction=h kannada-psts-before.svg right-arrow.svg kannada-psts-after.svg > kannada-psts.svg\n\n\n## 5 `haln`\n\n> Note: Noto Serif Kannada does not include a `haln` feature. Similar\n> behavior is found in `psts`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-before.svg --features=-haln,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c98,0ccd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-after.svg --features=+haln,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c98,0ccd\n\nsvg_stack.py --direction=h kannada-haln-before.svg right-arrow.svg kannada-haln-after.svg > kannada-haln.svg\n\n\n## 6 `dist`\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-dist-before.svg --features=-dist,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c93,0cf3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c93,0cf3\n\nsvg_stack.py --direction=h kannada-dist-before.svg right-arrow.svg kannada-dist-after.svg > kannada-dist.svg\n\n\n## 6 `abvm`\n\n> Note: Noto Serif Kannada does not include an `abvm` feature.\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-blwm-before.svg --features=-blwm,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cc1,0cbc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cc1,0cbc\n\nsvg_stack.py --direction=h kannada-blwm-before.svg right-arrow.svg kannada-blwm-after.svg > kannada-blwm.svg\n"
  },
  {
    "path": "images/khmer/khmer-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-khmer.md](../../opentype-shaping-khmer.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### Coeng forms\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-before.png --features=-blwf,-pstf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-after.png --features=+blwf,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783\n\nmontage khmer-coeng-kha-before.png right-arrow.png khmer-coeng-kha-after.png -geometry +0+0 -background transparent khmer-coeng-kha.png\n\n\n## The khmr shaping model\n\n### Robat\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-robat.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17cc\n\n\n## 2.2 Matra decomposition\n\n> Note: Noto Serif Khmer has decompositions for the\n> non-canonical-in-Unicode multi-part matras implemented in `psts`\n> features, but I have not figured out how to activate them in isolation.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-after.png --features=+ccmp,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c1,25cc,17b6\n\nmontage khmer-matra-decomposition-before.png right-arrow.png khmer-matra-decomposition-after.png -geometry +0+0 -background transparent khmer-matra-decomposition.png\n\n\n## 2.4 Pre-base-reordering Ro\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-before.png --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,25cc,17d2,179a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-after.png --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,17d2,179a\n\nmontage khmer-pref-before.png right-arrow.png khmer-pref-after.png -geometry +0+0 -background transparent khmer-pref.png\n\n\n## 3.1 `locl`\n\nNo examples found in Noto Khmer.\n\n\n## 3.2 `ccmp`\n\nNo examples found in Noto Khmer.\n\n\n## 3.3 `pref`\n\nSame as pre-base-reordering Ro.\n\n\n## 3.4 `blwf`\n\n> Note: altered bottom-margin number to fit.\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,25cc,17d2,17af\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,17d2,17af\n\nmontage khmer-blwf-before.png right-arrow.png khmer-blwf-after.png -geometry +0+0 -background transparent khmer-blwf.png\n\n\n## 3.5 `abvf`\n\n> Note: Noto Serif Khmer implements this as a `abvs` lookup.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-before.png --features=-abvf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-after.png --features=+abvf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca\n\nmontage khmer-abvf-before.png right-arrow.png khmer-abvf-after.png -geometry +0+0 -background transparent khmer-abvf.png\n\n## 3.6 `pstf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f\n\nmontage khmer-pstf-before.png right-arrow.png khmer-pstf-after.png -geometry +0+0 -background transparent khmer-pstf.png\n\n## 3.7 `cfar`\n\nNo examples found in Noto Khmer.\n\n\n## 4 `pres`\n\n> Note: Adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781\n\nmontage khmer-pres-before.png right-arrow.png khmer-pres-after.png -geometry +0+0 -background transparent khmer-pres.png\n\n\n## 4 `blws`\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780\n\nmontage khmer-blws-before.png right-arrow.png khmer-blws-after.png -geometry +0+0 -background transparent khmer-blws.png\n\n\n## 4 `abvs`\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd\n\nmontage khmer-abvs-before.png right-arrow.png khmer-abvs-after.png -geometry +0+0 -background transparent khmer-abvs.png\n\n\n## 4 `psts`\n\n> Note: Adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788\n\n\n## 4 `clig`\n\n> Note: Noto Serif Khmer implements this twice, in clig and liga.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-before.png --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-after.png --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6\n\nmontage khmer-clig-before.png right-arrow.png khmer-clig-after.png -geometry +0+0 -background transparent khmer-clig.png\n\n\n## 4 `liga`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-liga-before.png --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-liga-after.png --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5\n\nmontage khmer-liga-before.png right-arrow.png khmer-liga-after.png -geometry +0+0 -background transparent khmer-liga.png\n\n\n## 5 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-before.png --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-after.png --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc\n\nmontage khmer-dist-before.png right-arrow.png khmer-dist-after.png -geometry +0+0 -background transparent khmer-dist.png\n\n\n## 5 `kern`\n\nNo examples found in Noto Khmer....\n\n\n## 5 `blwm`\n\n> Note: adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc\n\nmontage khmer-blwm-before.png right-arrow.png khmer-blwm-after.png -geometry +0+0 -background transparent khmer-blwm.png\n\n\n## 5 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7\n\nmontage khmer-abvm-before.png right-arrow.png khmer-abvm-after.png -geometry +0+0 -background transparent khmer-abvm.png\n\n\n## 5 `mkmk`\n\nNo examples found in Noto Khmer.\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/khmer/khmer-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-khmer.md](../../opentype-shaping-khmer.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### Coeng forms\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-before.svg --features=-blwf,-pstf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-after.svg --features=+blwf,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783\n\nsvg_stack.py --direction=h khmer-coeng-kha-before.svg right-arrow.svg khmer-coeng-kha-after.svg > khmer-coeng-kha.svg\n\n\n## The khmr shaping model\n\n### Robat\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-robat.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17cc\n\n\n## 2.2 Matra decomposition\n\n> Note: Noto Serif Khmer has decompositions for the\n> non-canonical-in-Unicode multi-part matras implemented in `psts`\n> features, but I have not figured out how to activate them in isolation.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-after.svg --features=+ccmp,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c1,25cc,17b6\n\nsvg_stack.py --direction=h khmer-matra-decomposition-before.svg right-arrow.svg khmer-matra-decomposition-after.svg > khmer-matra-decomposition.svg\n\n\n## 2.4 Pre-base-reordering Ro\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-before.svg --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,25cc,17d2,179a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-after.svg --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,17d2,179a\n\nsvg_stack.py --direction=h khmer-pref-before.svg right-arrow.svg khmer-pref-after.svg > khmer-pref.svg\n\n\n#### Duplicates for other subsections\n\ncp khmer-pref.svg khmer-pref-1.svg\n\ncluster_styles = [\n\n\n\n## 3.1 `locl`\n\nNo examples found in Noto Khmer.\n\n\n## 3.2 `ccmp`\n\nNo examples found in Noto Khmer.\n\n\n## 3.3 `pref`\n\nSame as pre-base-reordering Ro.\n\n\n## 3.4 `blwf`\n\n> Note: altered bottom-margin number to fit.\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,25cc,17d2,17af\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,17d2,17af\n\nsvg_stack.py --direction=h khmer-blwf-before.svg right-arrow.svg khmer-blwf-after.svg > khmer-blwf.svg\n\n\n## 3.5 `abvf`\n\n> Note: Noto Serif Khmer implements this as a `abvs` lookup.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-before.svg --features=-abvf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-after.svg --features=+abvf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca\n\nsvg_stack.py --direction=h khmer-abvf-before.svg right-arrow.svg khmer-abvf-after.svg > khmer-abvf.svg\n\n## 3.6 `pstf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f\n\nsvg_stack.py --direction=h khmer-pstf-before.svg right-arrow.svg khmer-pstf-after.svg > khmer-pstf.svg\n\n## 3.7 `cfar`\n\nNo examples found in Noto Khmer.\n\n\n## 4 `pres`\n\n> Note: Adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781\n\nhb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781\n\nsvg_stack.py --direction=h khmer-pres-before.svg right-arrow.svg khmer-pres-after.svg > khmer-pres.svg\n\n\n## 4 `blws`\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780\n\nsvg_stack.py --direction=h khmer-blws-before.svg right-arrow.svg khmer-blws-after.svg > khmer-blws.svg\n\n\n## 4 `abvs`\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd\n\nsvg_stack.py --direction=h khmer-abvs-before.svg right-arrow.svg khmer-abvs-after.svg > khmer-abvs.svg\n\n\n## 4 `psts`\n\n> Note: Adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788\n\n\n## 4 `clig`\n\n> Note: Noto Serif Khmer implements this twice, in clig and liga.\n>\n> Note: It is no longer possible to deactivate clig in HarfBuzz. See\n> the issue at https://github.com/harfbuzz/harfbuzz/issues/1310 for\n> more information.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-before.svg --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-after.svg --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6\n\nsvg_stack.py --direction=h khmer-clig-before.svg right-arrow.svg khmer-clig-after.svg > khmer-clig.svg\n\n\n## 4 `liga`\n\n> Note: because Noto Serif Khmer duplicates all of its liga\n> substitutions in clig, which cannot be disabled in HarfBuzz (see the\n> preceding section about clig), it is not possible to disable the\n> liga substitutions either.\n\nhb-view --font-size=110 --margin=2,16,8,24 --output-file=khmer-liga-before.svg --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5\n\nhb-view --font-size=110 --margin=2,16,8,24 --output-file=khmer-liga-after.svg --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5\n\nsvg_stack.py --direction=h khmer-liga-before.svg right-arrow.svg khmer-liga-after.svg > khmer-liga.svg\n\n\n## 5 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc\n\nsvg_stack.py --direction=h khmer-dist-before.svg right-arrow.svg khmer-dist-after.svg > khmer-dist.svg\n\n\n## 5 `kern`\n\nNo examples found in Noto Khmer....\n\n\n## 5 `blwm`\n\n> Note: adjusted bottom margin.\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc\n\nhb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc\n\nsvg_stack.py --direction=h khmer-blwm-before.svg right-arrow.svg khmer-blwm-after.svg > khmer-blwm.svg\n\n\n## 5 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7\n\nsvg_stack.py --direction=h khmer-abvm-before.svg right-arrow.svg khmer-abvm-after.svg > khmer-abvm.svg\n\n\n## 5 `mkmk`\n\nNo examples found in Noto Khmer.\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/malayalam/malayalam-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-malayalam.md](../../opentype-shaping-malayalam.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d46,25cc,0d57\n\nmontage malayalam-matra-decompose-before.png right-arrow.png malayalam-matra-decompose-after.png -geometry +0+0 -background transparent malayalam-matra-decompose.png\n\n\n## 2.7 post-base consonants\n\n### Ya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f\n\nmontage malayalam-pstf-ya-before.png right-arrow.png malayalam-pstf-ya-after.png -geometry +0+0 -background transparent malayalam-pstf-ya.png\n\n### Va\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35\n\nmontage malayalam-pstf-va-before.png right-arrow.png malayalam-pstf-va-after.png -geometry +0+0 -background transparent malayalam-pstf-va.png\n\n\n## 3.2 `nukt`\n\n> Note: Noto Serif Malayalam uses `U+0323` \"Combining dot below\" in\n> its mark-placement lookups, not a Nukta (which does not exist in the\n> Malayalam Unicode block).\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,25cc,0323\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,0323\n\nmontage malayalam-nukt-before.png right-arrow.png malayalam-nukt-after.png -geometry +0+0 -background transparent malayalam-nukt.png\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37\n\nmontage malayalam-akhn-kssa-before.png right-arrow.png malayalam-akhn-kssa-after.png -geometry +0+0 -background transparent malayalam-akhn-kssa.png\n\n### NnTta\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f\n\nmontage malayalam-akhn-nntta-before.png right-arrow.png malayalam-akhn-nntta-after.png -geometry +0+0 -background transparent malayalam-akhn-nntta.png\n\n> Note: The \"Chillu R\" is shown here because it may be implemented as\n> an akhand form and that makes Malayalam distinct from several other\n> Indic scripts.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d7c\n\nmontage malayalam-akhn-chillu-r-before.png right-arrow.png malayalam-akhn-chillu-r-after.png -geometry +0+0 -background transparent malayalam-akhn-chillu-r.png\n\n\n## 3.4 `rphf`\n\n> Note: Malayalam modern orthography does not use Reph. The dot-reph\n> substitution here is shown with an accompanying note to that effect,\n> and is accompanied by the Chillu-R image.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e\n\nmontage malayalam-dot-reph-before.png right-arrow.png malayalam-dot-reph-after.png -geometry +0+0 -background transparent malayalam-dot-reph.png\n\n\n## 3.6 `pref`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-before.png --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-after.png --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30\n\nmontage malayalam-pstf-ra-before.png right-arrow.png malayalam-pstf-ra-after.png -geometry +0+0 -background transparent malayalam-pstf-ra.png\n\n\n## 3.7 `blwf`\n\n> Note: Noto Serif Malayalam includes a `blwf`-form \"La\" but does not\n> include a feature that accesses it. It is included in several `akhn`\n> ligatures, though. Instead, use SMC Rachana font.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32\n\nmontage malayalam-blwf-before.png right-arrow.png malayalam-blwf-after.png -geometry +0+0 -background transparent malayalam-blwf.png\n\n## 3.9 `half`\n\n> Note: Added a note to the shaping text about using `half` for Chillu\n> lookups.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-before.png --features=+half --background=FFFFFF00 --preserve-default-ignorables /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,2005,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-after.png --features=+half --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,200d\n\nmontage malayalam-half-before.png right-arrow.png malayalam-half-after.png -geometry +0+0 -background transparent malayalam-half.png\n\n\n## 3.10 `pstf`\n\n> Note: Uses the same images as 2.7\n\n## 3.12 `cjct`\n\n> Note: Noto Serif Malayalam implements this as an `akhn` feature.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31\n\nmontage malayalam-cjct-before.png right-arrow.png malayalam-cjct-after.png -geometry +0+0 -background transparent malayalam-cjct.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-before.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d47,0d2c,0d4d,0d1e,0d4d,0d1c,0d3e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d2c,0d4d,0d1e,0d4d,0d1c,0d4b\n\nmontage malayalam-matra-position-before.png right-arrow.png malayalam-matra-position-after.png -geometry +0+0 -background transparent malayalam-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-before.png --features=+akhn,-abvm,-mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,200d,0d23,0d4d,200d,0d21,0d41\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-after.png --features=+akhn,+abvm,+mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,0d23,0d4d,200d,0d21,0d41\n\nmontage malayalam-repha-position-before.png right-arrow.png malayalam-repha-position-after.png -geometry +0+0 -background transparent malayalam-repha-position.png\n\n\n## 4.4 Pre-base reordering\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-before.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=25cc,0d4d,0d30,0d39,0d4d,0d23,0d4d,0d21,0d4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d39,0d4d,0d23,0d4d,0d21,0d4d,0d30,0d4c\n\nmontage malayalam-pref-position-before.png right-arrow.png malayalam-pref-position-after.png -geometry +0+0 -background transparent malayalam-pref-position.png\n\n\n## 5 `blws`\n\n> Note: Noto Serif and Sans Malayalam have blws-like \"La\" features in\n> other lookups, such as `akhn`. I have not been able to isolate one\n> of them for usage.\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-before.png --features=-psts,-akhn --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-after.png --features=+psts,+akhn --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35\n\nmontage malayalam-psts-before.png right-arrow.png malayalam-psts-after.png -geometry +0+0 -background transparent malayalam-psts.png\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-before.png --features=-haln --background=FFFFFF00 --preserve-default-ignorables /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,2005,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-after.png --features=+haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,200d\n\nmontage malayalam-haln-before.png right-arrow.png malayalam-haln-after.png -geometry +0+0 -background transparent malayalam-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,32,2,16 --output-file=malayalam-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01\n\nmontage malayalam-abvm-before.png right-arrow.png malayalam-abvm-after.png -geometry +0+0 -background transparent malayalam-abvm.png\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62\n\nmontage malayalam-blwm-before.png right-arrow.png malayalam-blwm-after.png -geometry +0+0 -background transparent malayalam-blwm.png\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/malayalam/malayalam-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-malayalam.md](../../opentype-shaping-malayalam.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d46,25cc,0d57\n\nsvg_stack.py --direction=h malayalam-matra-decompose-before.svg right-arrow.svg malayalam-matra-decompose-after.svg > malayalam-matra-decompose.svg\n\n\n## 2.7 post-base consonants\n\n### Ya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f\n\nsvg_stack.py --direction=h malayalam-pstf-ya-before.svg right-arrow.svg malayalam-pstf-ya-after.svg > malayalam-pstf-ya.svg\n\n### Va\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35\n\nsvg_stack.py --direction=h malayalam-pstf-va-before.svg right-arrow.svg malayalam-pstf-va-after.svg > malayalam-pstf-va.svg\n\n\n#### Duplicates for other subsections\n\ncp malayalam-pstf-ya.svg malayalam-pstf-ya-1.svg\n\ncluster_styles = [\n\n\ncp malayalam-pstf-va.svg malayalam-pstf-va-1.svg\n\ncluster_styles = [\n\n\n## 3.2 `nukt`\n\n> Note: Noto Serif Malayalam uses `U+0323` \"Combining dot below\" in\n> its mark-placement lookups, not a Nukta (which does not exist in the\n> Malayalam Unicode block).\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,25cc,0323\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,0323\n\nsvg_stack.py --direction=h malayalam-nukt-before.svg right-arrow.svg malayalam-nukt-after.svg > malayalam-nukt.svg\n\n\n## 3.3 `akhn`\n\n### KSsa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37\n\nsvg_stack.py --direction=h malayalam-akhn-kssa-before.svg right-arrow.svg malayalam-akhn-kssa-after.svg > malayalam-akhn-kssa.svg\n\n### NnTta\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f\n\nsvg_stack.py --direction=h malayalam-akhn-nntta-before.svg right-arrow.svg malayalam-akhn-nntta-after.svg > malayalam-akhn-nntta.svg\n\n> Note: The \"Chillu R\" is shown here because it may be implemented as\n> an akhand form and that makes Malayalam distinct from several other\n> Indic scripts.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d7c\n\nsvg_stack.py --direction=h malayalam-akhn-chillu-r-before.svg right-arrow.svg malayalam-akhn-chillu-r-after.svg > malayalam-akhn-chillu-r.svg\n\n\n## 3.4 `rphf`\n\n> Note: Malayalam modern orthography does not use Reph. The dot-reph\n> substitution here is shown with an accompanying note to that effect,\n> and is accompanied by the Chillu-R image.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e\n\nsvg_stack.py --direction=h malayalam-dot-reph-before.svg right-arrow.svg malayalam-dot-reph-after.svg > malayalam-dot-reph.svg\n\n\n## 3.6 `pref`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-before.svg --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifﬁMalayalam-Regular.ttf --unicodes=0d4d,0d30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-after.svg --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30\n\nsvg_stack.py --direction=h malayalam-pstf-ra-before.svg right-arrow.svg malayalam-pstf-ra-after.svg > malayalam-pstf-ra.svg\n\n\n## 3.7 `blwf`\n\n> Note: Noto Serif Malayalam includes a `blwf`-form \"La\" but does not\n> include a feature that accesses it. It is included in several `akhn`\n> ligatures, though. Instead, use SMC Rachana font.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32\n\nsvg_stack.py --direction=h malayalam-blwf-before.svg right-arrow.svg malayalam-blwf-after.svg > malayalam-blwf.svg\n\n\n#### Duplicates for other subsections\n\ncp malayalam-blwf.svg malayalam-blwf-1.svg\n\ncluster_styles = [\n\n\n## 3.9 `half`\n\n> Note: Added a note to the shaping text about using `half` for Chillu\n> lookups.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-before.svg --features=+half --background=FFFFFF00 --preserve-default-ignorables /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,2005,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,200d\n\nsvg_stack.py --direction=h malayalam-half-before.svg right-arrow.svg malayalam-half-after.svg > malayalam-half.svg\n\n\n## 3.10 `pstf`\n\n> Note: Uses the same images as 2.7\n\n## 3.12 `cjct`\n\n> Note: Noto Serif Malayalam implements this as an `akhn` feature.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31\n\nsvg_stack.py --direction=h malayalam-cjct-before.svg right-arrow.svg malayalam-cjct-after.svg > malayalam-cjct.svg\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-before.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d47,0d2c,0d4d,0d1e,0d4d,0d1c,0d3e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d2c,0d4d,0d1e,0d4d,0d1c,0d4b\n\nsvg_stack.py --direction=h malayalam-matra-position-before.svg right-arrow.svg malayalam-matra-position-after.svg > malayalam-matra-position.svg\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-before.svg --features=+akhn,-abvm,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,200d,0d23,0d4d,200d,0d21,0d41\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-after.svg --features=+akhn,+abvm,+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,0d23,0d4d,200d,0d21,0d41\n\nsvg_stack.py --direction=h malayalam-repha-position-before.svg right-arrow.svg malayalam-repha-position-after.svg > malayalam-repha-position.svg\n\n\n## 4.4 Pre-base reordering\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-before.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=25cc,0d4d,0d30,0d39,0d4d,0d23,0d4d,0d21,0d4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d39,0d4d,0d23,0d4d,0d21,0d4d,0d30,0d4c\n\nsvg_stack.py --direction=h malayalam-pref-position-before.svg right-arrow.svg malayalam-pref-position-after.svg > malayalam-pref-position.svg\n\n\n## 5 `blws`\n\n> Note: Noto Serif and Sans Malayalam have blws-like \"La\" features in\n> other lookups, such as `akhn`. I have not been able to isolate one\n> of them for usage.\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-before.svg --features=-psts,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-after.svg --features=+psts,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35\n\nsvg_stack.py --direction=h malayalam-psts-before.svg right-arrow.svg malayalam-psts-after.svg > malayalam-psts.svg\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-before.svg --features=-haln --background=FFFFFF00 --preserve-default-ignorables /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,2005,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,200d\n\nsvg_stack.py --direction=h malayalam-haln-before.svg right-arrow.svg malayalam-haln-after.svg > malayalam-haln.svg\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=malayalam-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=malayalam-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01\n\nsvg_stack.py --direction=h malayalam-abvm-before.svg right-arrow.svg malayalam-abvm-after.svg > malayalam-abvm.svg\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62\n\nsvg_stack.py --direction=h malayalam-blwm-before.svg right-arrow.svg malayalam-blwm-after.svg > malayalam-blwm.svg\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/mongolian/mongolian-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-mongolian.md](../../opentype-shaping-mongolian.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### FVS\n\n#### No FVS\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180a,25cc\n\nontage mongolian-fvs-none-before.png right-arrow.png mongolian-fvs-none-after.png -geometry +0+0 -background transparent mongolian-fvs-none.png\n\n\n#### FVS1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,180a,25cc\n\nmontage mongolian-fvs-fvs1-before.png right-arrow.png mongolian-fvs-fvs1-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs1.png\n\n\n#### FVS2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,180a,25cc\n\nmontage mongolian-fvs-fvs2-before.png right-arrow.png mongolian-fvs-fvs2-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs2.png\n\n\n#### FVS3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc\n\nmontage mongolian-fvs-fvs3-before.png right-arrow.png mongolian-fvs-fvs3-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs3.png\n\n\n\n## 4.2 `isol`\n\n### `isol` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-before.png --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-after.png --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826\n\nmontage mongolian-isol-before.png right-arrow.png mongolian-isol-after.png -geometry +0+0 -background transparent mongolian-isol.png\n\n\n### `isol` FVS\n\n> Note: uses larger right margin\n\nhb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-isol-fvs1-before.png --features=-isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-fvs1-after.png --features=+isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b\n\nmontage mongolian-isol-fvs1-before.png right-arrow.png mongolian-isol-fvs1-after.png -geometry +0+0 -background transparent mongolian-isol-fvs1.png \n\n\n## 4.3 `fina`\n\n### `fina` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830\n\nmontage mongolian-fina-before.png right-arrow.png mongolian-fina-after.png -geometry +0+0 -background transparent mongolian-fina.png\n\n\n### `fina` FVS\n\n> Note: uses larger right margin\n\nhb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-fina-fvs2-before.png --features=-fina --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-fvs2-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c\n\nmontage mongolian-fina-fvs2-before.png right-arrow.png mongolian-fina-fvs2-after.png -geometry +0+0 -background transparent mongolian-fina-fvs2.png\n\n\n## 4.6 `medi`\n\n### `medi` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc\n\nmontage mongolian-medi-before.png right-arrow.png mongolian-medi-after.png -geometry +0+0 -background transparent mongolian-medi.png\n\n\n### `medi` FVS\n\n> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1\n> (which is zero-width)\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-after.png --features=+medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,180a,25cc\n\nmontage mongolian-medi-fvs1-before.png right-arrow.png mongolian-medi-fvs1-after.png -geometry +0+0 -background transparent mongolian-medi-fvs1.png\n\n\n## 4.8 `init`\n\n### `init` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc\n\nmontage mongolian-init-before.png right-arrow.png mongolian-init-after.png -geometry +0+0 -background transparent mongolian-init.png\n\n\n### `init` FVS\n\n> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1\n> (which is zero-width)\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-before.png --features=-init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-after.png --features=+init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,180a,25cc\n\nmontage mongolian-init-fvs1-before.png right-arrow.png mongolian-init-fvs1-after.png -geometry +0+0 -background transparent mongolian-init-fvs1.png\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820\n\nmontage mongolian-rlig-before.png right-arrow.png mongolian-rlig-after.png -geometry +0+0 -background transparent mongolian-rlig.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/mongolian/mongolian-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-mongolian.md](../../opentype-shaping-mongolian.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### FVS\n\n#### No FVS\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180a,25cc\n\nsvg_stack --direction=h mongolian-fvs-none-before.svg right-arrow.svg mongolian-fvs-none-after.svg > mongolian-fvs-none.svg\n\n\n#### FVS1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,180a,25cc\n\nsvg_stack --direction=h mongolian-fvs-fvs1-before.svg right-arrow.svg mongolian-fvs-fvs1-after.svg > mongolian-fvs-fvs1.svg\n\n\n#### FVS2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,180a,25cc\n\nsvg_stack --direction=h mongolian-fvs-fvs2-before.svg right-arrow.svg mongolian-fvs-fvs2-after.svg > mongolian-fvs-fvs2.svg\n\n\n#### FVS3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc\n\nsvg_stack --direction=h mongolian-fvs-fvs3-before.svg right-arrow.svg mongolian-fvs-fvs3-after.svg > mongolian-fvs-fvs3.svg\n\n\n\n## 4.2 `isol`\n\n### `isol` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-before.svg --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-after.svg --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826\n\nsvg_stack --direction=h mongolian-isol-before.svg right-arrow.svg mongolian-isol-after.svg > mongolian-isol.svg\n\n\n### `isol` FVS\n\n> Note: uses larger right margin\n\nhb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-isol-fvs1-before.svg --features=-isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-fvs1-after.svg --features=+isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b\n\nsvg_stack --direction=h mongolian-isol-fvs1-before.svg right-arrow.svg mongolian-isol-fvs1-after.svg > mongolian-isol-fvs1.svg \n\n\n## 4.3 `fina`\n\n### `fina` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830\n\nsvg_stack --direction=h mongolian-fina-before.svg right-arrow.svg mongolian-fina-after.svg > mongolian-fina.svg\n\n\n### `fina` FVS\n\n> Note: uses larger right margin\n\nhb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-fina-fvs2-before.svg --features=-fina --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-fvs2-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c\n\nsvg_stack --direction=h mongolian-fina-fvs2-before.svg right-arrow.svg mongolian-fina-fvs2-after.svg > mongolian-fina-fvs2.svg\n\n\n## 4.6 `medi`\n\n### `medi` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc\n\nsvg_stack --direction=h mongolian-medi-before.svg right-arrow.svg mongolian-medi-after.svg > mongolian-medi.svg\n\n\n### `medi` FVS\n\n> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1\n> (which is zero-width)\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-after.svg --features=+medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,180a,25cc\n\nsvg_stack --direction=h mongolian-medi-fvs1-before.svg right-arrow.svg mongolian-medi-fvs1-after.svg > mongolian-medi-fvs1.svg\n\n\n## 4.8 `init`\n\n### `init` general\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc\n\nsvg_stack --direction=h mongolian-init-before.svg right-arrow.svg mongolian-init-after.svg > mongolian-init.svg\n\n\n### `init` FVS\n\n> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1\n> (which is zero-width)\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-before.svg --features=-init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,200d,0020,0020,0020,202f,180a,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-after.svg --features=+init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,180a,25cc\n\nsvg_stack --direction=h mongolian-init-fvs1-before.svg right-arrow.svg mongolian-init-fvs1-after.svg > mongolian-init-fvs1.svg\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820\n\nsvg_stack --direction=h mongolian-rlig-before.svg right-arrow.svg mongolian-rlig-after.svg > mongolian-rlig.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/myanmar/myanmar-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-myanmar.md](../../opentype-shaping-myanmar.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### Variation Selector\n\n#### No VS\n\n> Note: SIL Padauk 3.x implements the dotted-form feature, but not\n> using Variation Selectors, for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1022\n\n#### VS dotted form\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-after.png --features=+psts --preserve-default-ignorables --language=KHT --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1022,fe00\n\n\nmontage myanmar-dotted-before.png right-arrow.png myanmar-dotted-after.png -geometry +0+0 -background transparent myanmar-dotted.png\n\n\n## 1. Kinzi\n\n> Note: Noto Sans Myanmar does not implement the `rphf` feature for\n> unknown reasons.\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,103a,1039,25cc\n\nmontage myanmar-kinzi-ra-before.png right-arrow.png myanmar-kinzi-ra-after.png -geometry +0+0 -background transparent myanmar-kinzi-ra.png\n\n\n### Nga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,25cc\n\nmontage myanmar-kinzi-nga-before.png right-arrow.png myanmar-kinzi-nga-after.png -geometry +0+0 -background transparent myanmar-kinzi-nga.png\n\n\n### Mon Nga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=105a,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=105a,103a,1039,25cc\n\nmontage myanmar-kinzi-monnga-before.png right-arrow.png myanmar-kinzi-monnga-after.png -geometry +0+0 -background transparent myanmar-kinzi-monnga.png\n\n\n## 2.4 Medial Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1017,200D,103C\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-after.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1017,103C\n\nmontage myanmar-medial-ra-before.png right-arrow.png myanmar-medial-ra-after.png -geometry +0+0 -background transparent myanmar-medial-ra.png\n\n\n## 3.1 locl\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,103d,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-after.png --features=+psts --language=KSW --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,103d,103e\n\nmontage myanmar-locl-before.png right-arrow.png myanmar-locl-after.png -geometry +0+0 -background transparent myanmar-locl.png\n\n\n## 3.3 rphf\n\n> Same as Kinzi\n\n\n## 3.4 pref\n\n> Note: Noto Sans Myanmar does not implement any pref features for\n> unknown reasons. This example shows a basic-shaping feature to\n> distinguish pref from the more stylistic applications of pres.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-before.png --features=-pref  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=103c,103f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-after.png --features=+pref  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=103f,103c\n\nmontage myanmar-pref-before.png right-arrow.png myanmar-pref-after.png -geometry +0+0 -background transparent myanmar-pref.png\n\n\n## 3.5 blwf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-before.png --features=-blwf  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,1039,100a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-after.png --features=+blwf  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,1039,100a\n\nmontage myanmar-blwf-before.png right-arrow.png myanmar-blwf-after.png -geometry +0+0 -background transparent myanmar-blwf.png\n\n\n## 3.6 pstf\n\n> Note: Noto Sans Myanmar does not include a pstf feature for unknown\n> reasons. This example shows an orthographically-selected variant, as\n> referred to on\n> https://r12a.github.io/scripts/myanmar/block#charTALL%20AA to\n> distinguish pstf as an initial-shaping feature from the more\n> stylistic applications of psts.\n>\n> Note: The example linked to above is used in the Microsoft\n> script-development spec for Myanmar:\n> https://docs.microsoft.com/en-us/typography/script-development/myanmar#feature-tag-pstf \n> but this usage is not well-attested in real-world Myanmar\n> fonts. Instead, the \"Aa\"/\"Tall Aa\" distinction is made at the\n> encoding level and is expected to happen during text\n> entry. Consequently, this image has been removed from the\n> script-specific shaping document. See issue #85 for the discussion.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-before.png --features=-pstf  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101d,102c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-after.png --features=+pstf  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101d,102b\n\nmontage myanmar-pstf-before.png right-arrow.png myanmar-pstf-after.png -geometry +0+0 -background transparent myanmar-pstf.png\n\n\n## 4 pres \n\n> Note: Noto Sans Myanmar does not implement this as a pres feature.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-before.png --features=-pres,-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100c,103c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-after.png --features=+pres,+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100c,103c\n\nmontage myanmar-pres-before.png right-arrow.png myanmar-pres-after.png -geometry +0+0 -background transparent myanmar-pres.png\n\n\n## 4 abvs\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-before.png --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,102d,1032\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-after.png --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,102d,1032\n\nmontage myanmar-abvs-before.png right-arrow.png myanmar-abvs-after.png -geometry +0+0 -background transparent myanmar-abvs.png\n\n\n## 4 blws\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-before.png --features=-blws  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=aa6b,103c,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-after.png --features=+blws  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=aa6b,103c,103e\n\nmontage myanmar-blws-before.png right-arrow.png myanmar-blws-after.png -geometry +0+0 -background transparent myanmar-blws.png\n\n\n## 4 psts\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-before.png --features=-blws  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,103b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-after.png --features=+blws  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,103b\n\nmontage myanmar-psts-before.png right-arrow.png myanmar-psts-after.png -geometry +0+0 -background transparent myanmar-psts.png\n\n\n## 4 liga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-before.png --features=-liga,-blws,-blwf  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-after.png --features=+liga  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e\n\nmontage myanmar-liga-before.png right-arrow.png myanmar-liga-after.png -geometry +0+0 -background transparent myanmar-liga.png\n\n\n## 5 dist\n\n> Note: Noto Sans Myanmar implements all distance adjustments in\n> `kern`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-before.png --features=-kern  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-after.png --features=+kern  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c\n\nmontage myanmar-dist-before.png right-arrow.png myanmar-dist-after.png -geometry +0+0 -background transparent myanmar-dist.png\n\n\n## 5 abvm\n\n> Note: Noto Sans Myanmar implements this as `mark`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-before.png --features=-mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-after.png --features=+mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008\n\nmontage myanmar-abvm-before.png right-arrow.png myanmar-abvm-after.png -geometry +0+0 -background transparent myanmar-abvm.png\n\n\n## 5 blwm\n\n> Note: Noto Sans Myanmar implements this as `mark`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-before.png --features=-mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1009,1039,101b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-after.png --features=+mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1009,1039,101b\n\nmontage myanmar-blwm-before.png right-arrow.png myanmar-blwm-after.png -geometry +0+0 -background transparent myanmar-blwm.png\n\n\n## 5 mark\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-before.png --features=-mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=107e,108d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-after.png --features=+mark  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=107e,108d\n\nmontage myanmar-mark-before.png right-arrow.png myanmar-mark-after.png -geometry +0+0 -background transparent myanmar-mark.png\n\n\n## 5 mkmk\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-before.png --features=-mkmk  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-after.png --features=+mkmk  --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e\n\nmontage myanmar-mkmk-before.png right-arrow.png myanmar-mkmk-after.png -geometry +0+0 -background transparent myanmar-mkmk.png\n\n\n\n\n\n"
  },
  {
    "path": "images/myanmar/myanmar-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-myanmar.md](../../opentype-shaping-myanmar.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Terminology\n\n### Variation Selector\n\n#### No VS\n\n> Note: SIL Padauk 3.x implements the dotted-form feature, but not\n> using Variation Selectors, for unknown reasons.\nﬁ\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-before.svg --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/padauk-3.003/PadaukBook-Regular.ttf --unicodes=1022\n\n#### VS dotted form\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-after.svg --features=+psts --preserve-default-ignorables --language=KHT --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/padauk-3.003/PadaukBook-Regular.ttf --unicodes=1022,fe00\n\n\nsvg_stack --direction=h myanmar-dotted-before.svg right-arrow.svg myanmar-dotted-after.svg > myanmar-dotted.svg\n\n\n## 1. Kinzi\n\n> Note: Noto Sans Myanmar does not implement the `rphf` feature for\n> unknown reasons.\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,103a,1039,25cc\n\nsvg_stack --direction=h myanmar-kinzi-ra-before.svg right-arrow.svg myanmar-kinzi-ra-after.svg > myanmar-kinzi-ra.svg\n\n\n### Nga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,25cc\n\nsvg_stack --direction=h myanmar-kinzi-nga-before.svg right-arrow.svg myanmar-kinzi-nga-after.svg > myanmar-kinzi-nga.svg\n\n\n### Mon Nga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=105a,200c,103a,1039,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=105a,103a,1039,25cc\n\nsvg_stack --direction=h myanmar-kinzi-monnga-before.svg right-arrow.svg myanmar-kinzi-monnga-after.svg > myanmar-kinzi-monnga.svg\n\n\n## 2.4 Medial Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-before.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1017,200D,103C\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1017,103C\n\nsvg_stack --direction=h myanmar-medial-ra-before.svg right-arrow.svg myanmar-medial-ra-after.svg > myanmar-medial-ra.svg\n\n\n## 3.1 locl\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-before.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,103d,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-after.svg --features=+psts --language=KSW --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,103d,103e\n\nsvg_stack --direction=h myanmar-locl-before.svg right-arrow.svg myanmar-locl-after.svg > myanmar-locl.svg\n\n\n## 3.3 rphf\n\n> Same as Kinzi\n\n\n## 3.4 pref\n\n> Note: Noto Sans Myanmar does not implement any pref features for\n> unknown reasons. This example shows a basic-shaping feature to\n> distinguish pref from the more stylistic applications of pres.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-before.svg --features=-pref  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=103c,103f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-after.svg --features=+pref  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=103f,103c\n\nsvg_stack --direction=h myanmar-pref-before.svg right-arrow.svg myanmar-pref-after.svg > myanmar-pref.svg\n\n\n## 3.5 blwf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-before.svg --features=-blwf  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,1039,100a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-after.svg --features=+blwf  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,1039,100a\n\nsvg_stack --direction=h myanmar-blwf-before.svg right-arrow.svg myanmar-blwf-after.svg > myanmar-blwf.svg\n\n\n## 3.6 pstf\n\n> Note: Noto Sans Myanmar does not include a pstf feature for unknown\n> reasons. This example shows an orthographically-selected variant, as\n> referred to on\n> https://r12a.github.io/scripts/myanmar/block#charTALL%20AA to\n> distinguish pstf as an initial-shaping feature from the more\n> stylistic applications of psts.\n>\n> Note: The example linked to above is used in the Microsoft\n> script-development spec for Myanmar:\n> https://docs.microsoft.com/en-us/typography/script-development/myanmar#feature-tag-pstf \n> but this usage is not well-attested in real-world Myanmar\n> fonts. Instead, the \"Aa\"/\"Tall Aa\" distinction is made at the\n> encoding level and is expected to happen during text\n> entry. Consequently, this image has been removed from the\n> script-specific shaping document. See issue #85 for the discussion.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-before.svg --features=-pstf  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101d,102c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-after.svg --features=+pstf  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101d,102b\n\nsvg_stack --direction=h myanmar-pstf-before.svg right-arrow.svg myanmar-pstf-after.svg > myanmar-pstf.svg\n\n\n## 4 pres \n\n> Note: Noto Sans Myanmar does not implement this as a pres feature.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-before.svg --features=-pres,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100c,103c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-after.svg --features=+pres,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100c,103c\n\nsvg_stack --direction=h myanmar-pres-before.svg right-arrow.svg myanmar-pres-after.svg > myanmar-pres.svg\n\n\n## 4 abvs\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,102d,1032\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,102d,1032\n\nsvg_stack --direction=h myanmar-abvs-before.svg right-arrow.svg myanmar-abvs-after.svg > myanmar-abvs.svg\n\n\n## 4 blws\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-before.svg --features=-blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=aa6b,103c,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-after.svg --features=+blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=aa6b,103c,103e\n\nsvg_stack --direction=h myanmar-blws-before.svg right-arrow.svg myanmar-blws-after.svg > myanmar-blws.svg\n\n\n## 4 psts\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-before.svg --features=-blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,103b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-after.svg --features=+blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,103b\n\nsvg_stack --direction=h myanmar-psts-before.svg right-arrow.svg myanmar-psts-after.svg > myanmar-psts.svg\n\n\n## 4 liga\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-before.svg --features=-liga,-blws,-blwf  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-after.svg --features=+liga  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e\n\nsvg_stack --direction=h myanmar-liga-before.svg right-arrow.svg myanmar-liga-after.svg > myanmar-liga.svg\n\n\n## 5 dist\n\n> Note: Noto Sans Myanmar implements all distance adjustments in\n> `kern`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-before.svg --features=-kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-after.svg --features=+kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c\n\nsvg_stack --direction=h myanmar-dist-before.svg right-arrow.svg myanmar-dist-after.svg > myanmar-dist.svg\n\n\n## 5 abvm\n\n> Note: Noto Sans Myanmar implements this as `mark`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-before.svg --features=-mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-after.svg --features=+mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008\n\nsvg_stack --direction=h myanmar-abvm-before.svg right-arrow.svg myanmar-abvm-after.svg > myanmar-abvm.svg\n\n\n## 5 blwm\n\n> Note: Noto Sans Myanmar implements this as `mark`.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-before.svg --features=-mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1009,1039,101b\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-after.svg --features=+mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1009,1039,101b\n\nsvg_stack --direction=h myanmar-blwm-before.svg right-arrow.svg myanmar-blwm-after.svg > myanmar-blwm.svg\n\n\n## 5 mark\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-before.svg --features=-mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=107e,108d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-after.svg --features=+mark  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=107e,108d\n\nsvg_stack --direction=h myanmar-mark-before.svg right-arrow.svg myanmar-mark-after.svg > myanmar-mark.svg\n\n\n## 5 mkmk\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-before.svg --features=-mkmk  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-after.svg --features=+mkmk  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e\n\nsvg_stack --direction=h myanmar-mkmk-before.svg right-arrow.svg myanmar-mkmk-after.svg > myanmar-mkmk.svg\n\n\n\n\n\n"
  },
  {
    "path": "images/nko/nko-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-nko.md](../../opentype-shaping-nko.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 4.3 `fina`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5\n\nmontage nko-fina-before.png right-arrow.png nko-fina-after.png -geometry +0+0 -background transparent nko-fina.png\n\n\n## 4.6 `medi`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa\n\nb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa\n\nmontage nko-medi-before.png right-arrow.png nko-medi-after.png -geometry +0+0 -background transparent nko-medi.png\n\n\n## 4.8 `init`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa\n\nmontage nko-init-before.png right-arrow.png nko-init-after.png -geometry +0+0 -background transparent nko-init.png\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1\n\nmontage nko-mark-before.png right-arrow.png nko-mark-after.png -geometry +0+0 -background transparent nko-mark.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/nko/nko-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-nko.md](../../opentype-shaping-nko.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 4.3 `fina`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5\n\nsvg_stack --direction=h nko-fina-before.svg right-arrow.svg nko-fina-after.svg > nko-fina.svg\n\n\n## 4.6 `medi`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa\n\nsvg_stack --direction=h nko-medi-before.svg right-arrow.svg nko-medi-after.svg > nko-medi.svg\n\n\n## 4.8 `init`\n\n> Note: Noto Sans NKo does not have a dotted-circle glyph. These\n> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa\n\nsvg_stack --direction=h nko-init-before.svg right-arrow.svg nko-init-after.svg > nko-init.svg\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1\n\nsvg_stack --direction=h nko-mark-before.svg right-arrow.svg nko-mark-after.svg > nko-mark.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/oriya/oriya-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-oriya.md](../../opentype-shaping-oriya.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,25cc,0b56\n\nmontage oriya-matra-decompose-before.png right-arrow.png oriya-matra-decompose-after.png -geometry +0+0 -background transparent oriya-matra-decompose.png\n\n\n## 2.7 Post-base consonants\n\n### Ya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-before.png --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-after.png --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f\n\nmontage oriya-pstf-ya-before.png right-arrow.png oriya-pstf-ya-after.png -geometry +0+0 -background transparent oriya-pstf-ya.png\n\n### Yya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-before.png --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-after.png --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f\n\nmontage oriya-pstf-yya-before.png right-arrow.png oriya-pstf-yya-after.png -geometry +0+0 -background transparent oriya-pstf-yya.png\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-before.png --features=-nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,25cc,0b3c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-after.png --features=+nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,0b3c\n\nmontage oriya-nukt-before.png right-arrow.png oriya-nukt-after.png -geometry +0+0 -background transparent oriya-nukt.png\n\n\n## 3.3 `akhn`\n\n### KSsa\n\n> Note: Noto Sans Oriya implements this in a `pres`+`blwf` combination\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-before.png --features=-pres,-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-after.png --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37\n\nmontage oriya-akhn-kssa-before.png right-arrow.png oriya-akhn-kssa-after.png -geometry +0+0 -background transparent oriya-akhn-kssa.png\n\n### JNya\n\n> Note: Noto Sans Oriya implements this in a `blwf`+`cjct` combination\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-before.png --features=-pres,-cjct,-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-after.png --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e\n\nmontage oriya-akhn-jnya-before.png right-arrow.png oriya-akhn-jnya-after.png -geometry +0+0 -background transparent oriya-akhn-jnya.png\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-before.png --features=-rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc\n\nmontage oriya-rphf-before.png right-arrow.png oriya-rphf-after.png -geometry +0+0 -background transparent oriya-rphf.png\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-before.png --features=-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-after.png --features=+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25\n\nmontage oriya-blwf-before.png right-arrow.png oriya-blwf-after.png -geometry +0+0 -background transparent oriya-blwf.png\n\n\n## 3.9 `half`\n\n> No examples found.\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n## 3.12 `cjct`\n\n> Not a perfect example....\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-before.png --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0b4d,25cc,0b4d,0b2a,0b4d,0b5d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-after.png --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0b4d,0b2a,0b4d,0b5d\n\nmontage oriya-cjct-before.png right-arrow.png oriya-cjct-after.png -geometry +0+0 -background transparent oriya-cjct.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,0b28,0b4d,200d,0b2d,0b4d,0b27,0b57\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,200d,0b2d,0b4d,0b27,0b4c\n\nmontage oriya-matra-position-before.png right-arrow.png oriya-matra-position-after.png -geometry +0+0 -background transparent oriya-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc,0b2a,0b4d,0b2a,0b4d,0b26,0b4d,0b2f,0b3e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,0b2a,0b4d,0b2a,0b4d,0b26,0b4d,0b2f,0b3e\n\nmontage oriya-reph-position-before.png right-arrow.png oriya-reph-position-after.png -geometry +0+0 -background transparent oriya-reph-position.png\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-before.png --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-after.png --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d\n\nmontage oriya-pres-before.png right-arrow.png oriya-pres-after.png -geometry +0+0 -background transparent oriya-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-before.png --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-after.png --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01\n\nmontage oriya-abvs-before.png right-arrow.png oriya-abvs-after.png -geometry +0+0 -background transparent oriya-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-before.png --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-after.png --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42\n\nmontage oriya-blws-before.png right-arrow.png oriya-blws-after.png -geometry +0+0 -background transparent oriya-blws.png\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-before.png --features=-psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-after.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01\n\nmontage oriya-psts-before.png right-arrow.png oriya-psts-after.png -geometry +0+0 -background transparent oriya-psts.png\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-before.png --features=-haln,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-after.png --features=+haln,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d\n\nmontage oriya-haln-before.png right-arrow.png oriya-haln-after.png -geometry +0+0 -background transparent oriya-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-before.png --features=-abvm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b4d,0b18,0b48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-after.png --features=+abvm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b4d,0b18,0b48\n\nmontage oriya-abvm-before.png right-arrow.png oriya-abvm-after.png -geometry +0+0 -background transparent oriya-abvm.png\n\n\n## 6 `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-before.png --features=-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-after.png --features=+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44\n\nmontage oriya-blwm-before.png right-arrow.png oriya-blwm-after.png -geometry +0+0 -background transparent oriya-blwm.png\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/oriya/oriya-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-oriya.md](../../opentype-shaping-oriya.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,25cc,0b57\n\nsvg_stack.py --direction=h oriya-matra-decompose-before.svg right-arrow.svg oriya-matra-decompose-after.svg > oriya-matra-decompose.svg\n\n\n## 2.7 Below-base consonants\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-ra-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-ra-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b30\n\nsvg_stack.py --direction=h oriya-blwf-ra-before.svg right-arrow.svg oriya-blwf-ra-after.svg > oriya-blwf-ra.svg\n\n\n#### Duplicates for other subsections\n\ncp oriya-blwf-ra.svg oriya-blwf-ra-1.svg\n\ncluster_styles = [\n\ncp oriya-blwf-ra.svg oriya-blwf-ra-2.svg\n\ncluster_styles = [\n\n\n\n\n## 2.7 Post-base consonants\n\n### Ya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f\n\nsvg_stack.py --direction=h oriya-pstf-ya-before.svg right-arrow.svg oriya-pstf-ya-after.svg > oriya-pstf-ya.svg\n\n### Yya\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f\n\nsvg_stack.py --direction=h oriya-pstf-yya-before.svg right-arrow.svg oriya-pstf-yya-after.svg > oriya-pstf-yya.svg\n\n\n#### Duplicates for other subsections\n\ncp oriya-pstf-ya.svg oriya-pstf-ya-1.svg\n\ncluster_styles = [\n\n\ncp oriya-pstf-yya.svg oriya-pstf-yya-1.svg\n\ncluster_styles = [\n\n\n## 3.2 `nukt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-before.svg --features=-nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,25cc,0b3c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-after.svg --features=+nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,0b3c\n\nsvg_stack.py --direction=h oriya-nukt-before.svg right-arrow.svg oriya-nukt-after.svg > oriya-nukt.svg\n\n\n## 3.3 `akhn`\n\n### KSsa\n\n> Note: Noto Sans Oriya implements this in a `pres`+`blwf` combination\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-before.svg --features=-pres,-blwf,-akhn,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-after.svg --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37\n\nsvg_stack.py --direction=h oriya-akhn-kssa-before.svg right-arrow.svg oriya-akhn-kssa-after.svg > oriya-akhn-kssa.svg\n\n### JNya\n\n> Note: Noto Sans Oriya implements this in a `blwf`+`cjct` combination\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-before.svg --features=-pres,-cjct,-blwf,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-after.svg --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e\n\nsvg_stack.py --direction=h oriya-akhn-jnya-before.svg right-arrow.svg oriya-akhn-jnya-after.svg > oriya-akhn-jnya.svg\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-before.svg --features=-rphf,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-after.svg --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc\n\nsvg_stack.py --direction=h oriya-rphf-before.svg right-arrow.svg oriya-rphf-after.svg > oriya-rphf.svg\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-before.svg --features=-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-after.svg --features=+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25\n\nsvg_stack.py --direction=h oriya-blwf-before.svg right-arrow.svg oriya-blwf-after.svg > oriya-blwf.svg\n\n\n## 3.9 `half`\n\n> No examples found.\n\n## 3.10 `pstf`\n\n> Same as 2.7\n\n## 3.12 `cjct`\n\n> Not a perfect example....\n> Noto Serif Oriya implements this in a combination of multiple\n> features, including akhn and blwf. It also applies haln, which must\n> be deactivated in this illustration because it is documented as\n> being applied later.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-before.svg --features=-blwf,-akhn,-cjct --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0bd4,0b2a,0b40\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-after.svg --features=+cjct --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0bd4,0b2a,0b40\n\nsvg_stack.py --direction=h oriya-cjct-before.svg right-arrow.svg oriya-cjct-after.svg > oriya-cjct.svg\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,0b36,0b4d,0b24,0b4d,0b30,0b4d,0b2f,0b56\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b36,0b4d,0b24,0b4d,0b30,0b4d,0b2f,0b48\n\nsvg_stack.py --direction=h oriya-matra-position-before.svg right-arrow.svg oriya-matra-position-after.svg > oriya-matra-position.svg\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc,0b2a,0b4d,0b27,0b4d,0b30,0b4d,0b2f,0b3e,0b41\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,0b2a,0b4d,0b27,0b4d,0b30,0b4d,0b2f,0b3e,0b41\n\nsvg_stack.py --direction=h oriya-reph-position-before.svg right-arrow.svg oriya-reph-position-after.svg > oriya-reph-position.svg\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-before.svg --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-after.svg --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d\n\nsvg_stack.py --direction=h oriya-pres-before.svg right-arrow.svg oriya-pres-after.svg > oriya-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-before.svg --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-after.svg --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01\n\nsvg_stack.py --direction=h oriya-abvs-before.svg right-arrow.svg oriya-abvs-after.svg > oriya-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-before.svg --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-after.svg --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42\n\nsvg_stack.py --direction=h oriya-blws-before.svg right-arrow.svg oriya-blws-after.svg > oriya-blws.svg\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-before.svg --features=-psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-after.svg --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01\n\nsvg_stack.py --direction=h oriya-psts-before.svg right-arrow.svg oriya-psts-after.svg > oriya-psts.svg\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-before.svg --features=-haln,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-after.svg --features=+haln,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d\n\nsvg_stack.py --direction=h oriya-haln-before.svg right-arrow.svg oriya-haln-after.svg > oriya-haln.svg\n\n\n## 6 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-dist-before.svg --features=-dist --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b42,0b15\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-dist-after.svg --features=+dist --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b42,0b15\n\nsvg_stack.py --direction=h oriya-dist-before.svg right-arrow.svg oriya-dist-after.svg > oriya-dist.svg\n\n\n## 6 `abvm`\n\n> Note: Noto Serif Oriya implements this as `blwm` for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-before.svg --features=-abvm,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-after.svg --features=+abvm,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b48\n\nsvg_stack.py --direction=h oriya-abvm-before.svg right-arrow.svg oriya-abvm-after.svg > oriya-abvm.svg\n\n\n## 6 `blwm`\n\n> Note: Noto Serif Oriya implements this as `abvm` for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-before.svg --features=-abvm,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-after.svg --features=+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44\n\nsvg_stack.py --direction=h oriya-blwm-before.svg right-arrow.svg oriya-blwm-after.svg > oriya-blwm.svg\n\n"
  },
  {
    "path": "images/sinhala/sinhala-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-sinhala.md](../../opentype-shaping-sinhala.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dda\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,25cc,0dca\n\nmontage sinhala-matra-decompose-before.png right-arrow.png sinhala-matra-decompose-after.png -geometry +0+0 -background transparent sinhala-matra-decompose.png\n\n\n## 2.7 Post-base consonants\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-before.png --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-after.png --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba\n\nmontage sinhala-vatu-va-before.png right-arrow.png sinhala-vatu-va-after.png -geometry +0+0 -background transparent sinhala-vatu-va.png\n\n\n## 3.3 `akhn`\n\n### Ligature\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-before.png --features=-akhn --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,25cc,0dca,200d,0dc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,0dca,200d,0dc2\n\nmontage sinhala-akhn-ligature-before.png right-arrow.png sinhala-akhn-ligature-after.png -geometry +0+0 -background transparent sinhala-akhn-ligature.png\n\n### Touching\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-before.png --features=-akhn,-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,25cc,0dca,0d9d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-after.png --features=+akhn,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,0dca,0d9d\n\nmontage sinhala-akhn-touching-before.png right-arrow.png sinhala-akhn-touching-after.png -geometry +0+0 -background transparent sinhala-akhn-touching.png\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-rphf-before.png --features=-rphf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc\n\nhb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-rphf-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc\n\nmontage sinhala-rphf-before.png right-arrow.png sinhala-rphf-after.png -geometry +0+0 -background transparent sinhala-rphf.png\n\n\n## 3.10 `pstf`\n\n> Not needed?\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dde\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0ddf\n\nmontage sinhala-pstf-before.png right-arrow.png sinhala-pstf-after.png -geometry +0+0 -background transparent sinhala-pstf.png\n\n\n## 3.11 `vatu`\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-before.png --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-after.png --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb\n\nmontage sinhala-vatu-ra-before.png right-arrow.png sinhala-vatu-ra-after.png -geometry +0+0 -background transparent sinhala-vatu-ra.png\n\n### Va\n\n> Same as 2.7\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0dcf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0ddc\n\nmontage sinhala-matra-position-before.png right-arrow.png sinhala-matra-position-after.png -geometry +0+0 -background transparent sinhala-matra-position.png\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-reph-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-reph-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1\n\nmontage sinhala-reph-position-before.png right-arrow.png sinhala-reph-position-after.png -geometry +0+0 -background transparent sinhala-reph-position.png\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-before.png --features=-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,25cc,0dca,0da2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,0dca,0da2\n\nmontage sinhala-pres-before.png right-arrow.png sinhala-pres-after.png -geometry +0+0 -background transparent sinhala-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3\n\nmontage sinhala-abvs-before.png right-arrow.png sinhala-abvs-after.png -geometry +0+0 -background transparent sinhala-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6\n\nmontage sinhala-blws-before.png right-arrow.png sinhala-blws-after.png -geometry +0+0 -background transparent sinhala-blws.png\n\n\n## 5 `psts`\n\n> Note: this lookup only works in Noto Sans. Needs more investigation.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSinhala-Regular.ttf --unicodes=0daf,0dca,200d,0dba,0ddd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSinhala-Regular.ttf --unicodes=0daf,0dca,200d,0dba,0ddd\n\nmontage sinhala-psts-before.png right-arrow.png sinhala-psts-after.png -geometry +0+0 -background transparent sinhala-psts.png\n\n\n## 6 `abvm`\n\n> Note: Noto Sans Sinhala implements this as an `abvs` substitution\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dae\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dae\n\nmontage sinhala-abvm-before.png right-arrow.png sinhala-abvm-after.png -geometry +0+0 -background transparent sinhala-abvm.png\n\n\n## 6 `blwm`\n\n> Note: Noto Sans Sinhala double-implements this in both `blwm` and\n> `abvs`, even though it is clearly not above-base.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-before.png --features=-blwm,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9f,0dca,200d,0dbb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-after.png --features=+blwm,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9f,0dca,200d,0dbb\n\nmontage sinhala-blwm-before.png right-arrow.png sinhala-blwm-after.png -geometry +0+0 -background transparent sinhala-blwm.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/sinhala/sinhala-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-sinhala.md](../../opentype-shaping-sinhala.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dda\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,25cc,0dca\n\nsvg_stack.py --direction=h sinhala-matra-decompose-before.svg right-arrow.svg sinhala-matra-decompose-after.svg > sinhala-matra-decompose.svg\n\n\n## 2.7 Post-base consonants\n\n### Ra\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-before.svg --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-after.svg --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb\n\nsvg_stack.py --direction=h sinhala-vatu-ra-before.svg right-arrow.svg sinhala-vatu-ra-after.svg > sinhala-vatu-ra.svg\n\n### Va\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-before.svg --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-after.svg --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba\n\nsvg_stack.py --direction=h sinhala-vatu-va-before.svg right-arrow.svg\nsinhala-vatu-va-after.svg > sinhala-vatu-va.svg\n\n\n#### Duplicates for other subsections\n\ncp sinhala-vatu-ra.svg sinhala-vatu-ra-1.svg\n\ncluster_styles = [\n\n\ncp sinhala-vatu-va.svg sinhala-vatu-va-1.svg\n\ncluster_styles = [\n\n\n## 3.3 `akhn`\n\n### Ligature\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-before.svg --features=-akhn --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,25cc,0dca,200d,0dc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,0dca,200d,0dc2\n\nsvg_stack.py --direction=h sinhala-akhn-ligature-before.svg right-arrow.svg sinhala-akhn-ligature-after.svg > sinhala-akhn-ligature.svg\n\n### Touching\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-before.svg --features=-akhn,-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,25cc,0dca,0d9d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-after.svg --features=+akhn,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,0dca,0d9d\n\nsvg_stack.py --direction=h sinhala-akhn-touching-before.svg right-arrow.svg sinhala-akhn-touching-after.svg > sinhala-akhn-touching.svg\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-rphf-before.svg --features=-rphf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc\n\nhb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc\n\nsvg_stack.py --direction=h sinhala-rphf-before.svg right-arrow.svg sinhala-rphf-after.svg > sinhala-rphf.svg\n\n\n#### Duplicates for other subsections\n\ncp sinhala-rphf.svg sinhala-rphf-1.svg\n\ncluster_styles = [\n\n\n## 3.10 `pstf`\n\n> Not needed?\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dde\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0ddf\n\nsvg_stack.py --direction=h sinhala-pstf-before.svg right-arrow.svg sinhala-pstf-after.svg > sinhala-pstf.svg\n\n\n## 3.11 `vatu`\n\n> Same as 2.7\n\n\n### Va\n\n> Same as 2.7\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0dcf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0ddc\n\nsvg_stack.py --direction=h sinhala-matra-position-before.svg right-arrow.svg sinhala-matra-position-after.svg > sinhala-matra-position.svg\n\n\n## 4.3 Reph position\n\nhb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-reph-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-reph-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1\n\nsvg_stack.py --direction=h sinhala-reph-position-before.svg right-arrow.svg sinhala-reph-position-after.svg > sinhala-reph-position.svg\n\n\n## 5 `pres`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-before.svg --features=-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,25cc,0dca,0da2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,0dca,0da2\n\nsvg_stack.py --direction=h sinhala-pres-before.svg right-arrow.svg sinhala-pres-after.svg > sinhala-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3\n\nsvg_stack.py --direction=h sinhala-abvs-before.svg right-arrow.svg sinhala-abvs-after.svg > sinhala-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6\n\nsvg_stack.py --direction=h sinhala-blws-before.svg right-arrow.svg sinhala-blws-after.svg > sinhala-blws.svg\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dd1\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dd1\n\nsvg_stack.py --direction=h sinhala-psts-before.svg right-arrow.svg sinhala-psts-after.svg > sinhala-psts.svg\n\n\n## 6 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-dist-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc5,0ddf\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-dist-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc5,0ddf\n\nsvg_stack.py --direction=h sinhala-dist-before.svg right-arrow.svg sinhala-dist-after.svg > sinhala-dist.svg\n\n\n## 6 `abvm`\n\n> Note: Noto Serif Sinhala implements this as an `abvs`\n> substitution. This makes it a less-than ideal illustration, because\n> the \"after\" SVG is a ligated glyph; it must suffice until a suitable\n> alternative is found.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc6,0dd3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc6,0dd3\n\nsvg_stack.py --direction=h sinhala-abvm-before.svg right-arrow.svg sinhala-abvm-after.svg > sinhala-abvm.svg\n\n\n## 6 `blwm`\n\n> Note: Noto Serif Sinhala double-implements this as a `blws`\n> substitution. This makes it a less-than ideal illustration, because\n> the \"after\" SVG is a ligated glyph; it must suffice until a suitable\n> alternative is found.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da7,0dca,200d,0da8,0dd4\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da7,0dca,200d,0da8,0dd4\n\nsvg_stack.py --direction=h sinhala-blwm-before.svg right-arrow.svg sinhala-blwm-after.svg > sinhala-blwm.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/syriac/syriac-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-syriac.md](../../opentype-shaping-syriac.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Dalath Rish group ##\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-dalath-rish.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0715,072a,0716\n\n\n## 3. `stch`\n\n> Note: Noto seems to implement this in a set of `calt` substitutions,\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-before.png --features=-stch,-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-after.png --features=+stch --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710\n\nmontage syriac-stch-before.png right-arrow.png syriac-stch-after.png -geometry +0+0 -background transparent syriac-stch.png\n\n\n## 4.1 `locl`\n\n> Note: None found in Noto fonts.\n\n\n## 4.2 `isol`\n\n> Note: none found in Noto fonts.\n\n\n## 4.3 `fina`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722\n\n\n## 4.4 `fin2`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-before.png --features=-fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-after.png --features=+fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710\n\nmontage syriac-fin2-before.png right-arrow.png syriac-fin2-after.png -geometry +0+0 -background transparent syriac-fin2.png\n\n\n## 4.5 `fin3`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-before.png --features=-fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-after.png --features=+fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710\n\nmontage syriac-fin3-before.png right-arrow.png syriac-fin3-after.png -geometry +0+0 -background transparent syriac-fin3.png\n\n\n## 4.6 `medi`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc\n\nmontage syriac-medi-before.png right-arrow.png syriac-medi-after.png -geometry +0+0 -background transparent syriac-medi.png\n\n\n## 4.7 `med2`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-before.png --features=-med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-after.png --features=+med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc\n\nmontage syriac-med2-before.png right-arrow.png syriac-med2-after.png -geometry +0+0 -background transparent syriac-med2.png\n\n\n## 4.8 `init`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc\n\nmontage syriac-init-before.png right-arrow.png syriac-init-after.png -geometry +0+0 -background transparent syriac-init.png\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,25cc,0308\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,0308\n\nmontage syriac-rlig-before.png right-arrow.png syriac-rlig-after.png -geometry +0+0 -background transparent syriac-rlig.png\n\n\n## 4.11 `calt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-before.png --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-after.png --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c\n\nmontage syriac-calt-before.png right-arrow.png syriac-calt-after.png -geometry +0+0 -background transparent syriac-calt.png\n\n\n## 5.1 `liga`\n\n> Note: Noto Syriac implements this as a `calt` lookup for unknown reasons.\n>\n> This seems to be a known shortcoming. See\n> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)\n> for more information.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-before.png --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-after.png --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710\n\nmontage syriac-liga-before.png right-arrow.png syriac-liga-after.png -geometry +0+0 -background transparent syriac-liga.png\n\n\n## 5.2 `dlig`\n\n> Note: none found in Noto Syriac.\n>\n> This seems to be a known shortcoming. See\n> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)\n> for more information.\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733\n\nmontage syriac-mark-before.png right-arrow.png syriac-mark-after.png -geometry +0+0 -background transparent syriac-mark.png\n\n\n## 7.4 `mkmk`\n\n> Note: Noto Sans Syriac (all) fonts have a `mkmk` table but it does\n> not seem to work.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/syriac/syriac-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-syriac.md](../../opentype-shaping-syriac.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Dalath Rish group ##\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-dalath-rish.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0715,072a,0716\n\n\n## 3. `stch`\n\n> Note: Noto seems to implement this in a set of `calt` substitutions,\n> for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-before.svg --features=-stch,-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-after.svg --features=+stch --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710\n\nsvg_stack --direction=h syriac-stch-before.svg right-arrow.svg syriac-stch-after.svg > syriac-stch.svg\n\n\n## 4.1 `locl`\n\n> Note: None found in Noto fonts.\n\n\n## 4.2 `isol`\n\n> Note: none found in Noto fonts.\n\n\n## 4.3 `fina`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722\n\nsvg_stack --direction=h syriac-fina-before.svg right-arrow.svg syriac-fina-after.svg > syriac-fina.svg\n\n\n## 4.4 `fin2`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-before.svg --features=-fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-after.svg --features=+fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710\n\nsvg_stack --direction=h syriac-fin2-before.svg right-arrow.svg syriac-fin2-after.svg > syriac-fin2.svg\n\n\n## 4.5 `fin3`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-before.svg --features=-fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-after.svg --features=+fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710\n\nsvg_stack --direction=h syriac-fin3-before.svg right-arrow.svg syriac-fin3-after.svg > syriac-fin3.svg\n\n\n## 4.6 `medi`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc\n\nsvg_stack --direction=h syriac-medi-before.svg right-arrow.svg syriac-medi-after.svg > syriac-medi.svg\n\n\n## 4.7 `med2`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-before.svg --features=-med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-after.svg --features=+med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc\n\nsvg_stack --direction=h syriac-med2-before.svg right-arrow.svg syriac-med2-after.svg > syriac-med2.svg\n\n\n## 4.8 `init`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc\n\nsvg_stack --direction=h syriac-init-before.svg right-arrow.svg syriac-init-after.svg > syriac-init.svg\n\n\n## 4.9 `rlig`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,25cc,0308\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,0308\n\nsvg_stack --direction=h syriac-rlig-before.svg right-arrow.svg syriac-rlig-after.svg > syriac-rlig.svg\n\n\n## 4.11 `calt`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c\n\nsvg_stack --direction=h syriac-calt-before.svg right-arrow.svg syriac-calt-after.svg > syriac-calt.svg\n\n\n## 5.1 `liga`\n\n> Note: Noto Syriac implements this as a `calt` lookup for unknown reasons.\n>\n> This seems to be a known shortcoming. See\n> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)\n> for more information.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710\n\nsvg_stack --direction=h syriac-liga-before.svg right-arrow.svg syriac-liga-after.svg > syriac-liga.svg\n\n\n## 5.2 `dlig`\n\n> Note: none found in Noto Syriac.\n>\n> This seems to be a known shortcoming. See\n> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)\n> for more information.\n\n\n## 7.3 `mark`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733\n\nsvg_stack --direction=h syriac-mark-before.svg right-arrow.svg syriac-mark-after.svg > syriac-mark.svg\n\n\n## 7.4 `mkmk`\n\n> Note: Noto Sans Syriac (all) fonts have a `mkmk` table but it does\n> not seem to work.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/tamil/tamil-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-tamil.md](../../opentype-shaping-tamil.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bcc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,25cc,0bd7\n\nmontage tamil-matra-decompose-before.png right-arrow.png tamil-matra-decompose-after.png -geometry +0+0 -background transparent tamil-matra-decompose.png\n\n\n## 3.2 `nukt`\n\n> None found.\n\n\n## 3.3 `akhn`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7\n\nmontage tamil-akhn-kssa-before.png right-arrow.png tamil-akhn-kssa-after.png -geometry +0+0 -background transparent tamil-akhn-kssa.png\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-before.png --features=-half,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,25cc,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-after.png --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,0bcd\n\nmontage tamil-half-before.png right-arrow.png tamil-half-after.png -geometry +0+0 -background transparent tamil-half.png\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,0bb0,0bcd,0b9a,0bcd,0b9c,0bbe\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb0,0bcd,0b9a,0bcd,0b9c,0bca\n\nmontage tamil-matra-position-before.png right-arrow.png tamil-matra-position-after.png -geometry +0+0 -background transparent tamil-matra-position.png\n\n\n## 5 `pres`\n\n> Note: Noto Serif Tamil implements this as an `akhn` feature for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0\n\nmontage tamil-pres-before.png right-arrow.png tamil-pres-after.png -geometry +0+0 -background transparent tamil-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0\n\nmontage tamil-abvs-before.png right-arrow.png tamil-abvs-after.png -geometry +0+0 -background transparent tamil-abvs.png\n\n\n## 5 `blws`\n\n> None found.\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2\n\nmontage tamil-psts-before.png right-arrow.png tamil-psts-after.png -geometry +0+0 -background transparent tamil-psts.png\n\n\n## `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-before.png --features=-haln,-half,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd\n\nmontage tamil-haln-before.png right-arrow.png tamil-haln-after.png -geometry +0+0 -background transparent tamil-haln.png\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd\n\nmontage tamil-abvm-before.png right-arrow.png tamil-abvm-after.png -geometry +0+0 -background transparent tamil-abvm.png\n\n\n## 6 `blwm`\n\n> None found.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/tamil/tamil-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-tamil.md](../../opentype-shaping-tamil.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bcc\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,25cc,0bd7\n\nsvg_stack.py --direction=h tamil-matra-decompose-before.svg right-arrow.svg tamil-matra-decompose-after.svg > tamil-matra-decompose.svg\n\n\n## 3.2 `nukt`\n\n> None found. Testing with Grantha Nukta.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-nukt-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf5,25cc,1133c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-nukt-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf5,1133c\n\nsvg_stack.py --direction=h tamil-nukt-before.svg right-arrow.svg tamil-nukt-after.svg > tamil-nukt.svg\n\n\n## 3.3 `akhn`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7\n\nsvg_stack.py --direction=h tamil-akhn-kssa-before.svg right-arrow.svg tamil-akhn-kssa-after.svg > tamil-akhn-kssa.svg\n\n\n## 3.9 `half`\n\n> Simulated output using a `mark` lookup; no example found.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-before.svg --features=-half,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,25cc,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,0bcd\n\nsvg_stack.py --direction=h tamil-half-before.svg right-arrow.svg tamil-half-after.svg > tamil-half.svg\n\n\n## 4.2 Pre-base matras\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,0bb0,0bcd,0b9a,0bcd,0b9c,0bbe\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb0,0bcd,0b9a,0bcd,0b9c,0bca\n\nsvg_stack.py --direction=h tamil-matra-position-before.svg right-arrow.svg tamil-matra-position-after.svg > tamil-matra-position.svg\n\n\n## 5 `pres`\n\n> Note: Noto Serif Tamil implements this as an `akhn` feature for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0\n\nsvg_stack.py --direction=h tamil-pres-before.svg right-arrow.svg tamil-pres-after.svg > tamil-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0\n\nsvg_stack.py --direction=h tamil-abvs-before.svg right-arrow.svg tamil-abvs-after.svg > tamil-abvs.svg\n\n\n## 5 `blws`\n\n> None found.\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2\n\nsvg_stack.py --direction=h tamil-psts-before.svg right-arrow.svg tamil-psts-after.svg > tamil-psts.svg\n\n\n## `haln`\n\n> Simulated output using a `mark` lookup; no example found.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-before.svg --features=-haln,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,25cc,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd\n\nsvg_stack.py --direction=h tamil-haln-before.svg right-arrow.svg tamil-haln-after.svg > tamil-haln.svg\n\n\n## 6 `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf4,0b85\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf4,0b85\n\nsvg_stack.py --direction=h tamil-dist-before.svg right-arrow.svg tamil-dist-after.svg > tamil-dist.svg\n\n\n## 6 `kern`\n\n> None found.\n\n\n## 6 `abvm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd\n\nsvg_stack.py --direction=h tamil-abvm-before.svg right-arrow.svg tamil-abvm-after.svg > tamil-abvm.svg\n\n\n## 6 `blwm`\n\n> Note: Noto Serif Tamil has a `blwm` feature, but it fails to attach\n> the included mark (`U+0952`) for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf7,0952\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf7,0952\n\nsvg_stack.py --direction=h tamil-blwm-before.svg right-arrow.svg tamil-blwm-after.svg > tamil-blwm.svg\n\n\n"
  },
  {
    "path": "images/telugu/telugu-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-telugu.md](../../opentype-shaping-telugu.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c46,25cc,0c56\n\nmontage telugu-matra-decompose-before.png right-arrow.png telugu-matra-decompose-after.png -geometry +0+0 -background transparent telugu-matra-decompose.png\n\n\n## 3.3 `akhn`\n\n### KSsa\n\n> Note: Noto Serif Telugu implements this as a `pres`+`blwf`\n> substitution for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-before.png --features=-blwf,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,25cc,0c4d,0c37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,0c4d,0c37\n\nmontage telugu-akhn-kssa-before.png right-arrow.png telugu-akhn-kssa-after.png -geometry +0+0 -background transparent telugu-akhn-kssa.png\n\n### JNya\n\n> None found. Microsoft docs reference a \"SsJa\" akhand form, which is\n> also not found.\n\n\n## 3.4 `rphf`\n\n> None found. \n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,25cc,0c4d,0c24\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,0c4d,0c24\n\nmontage telugu-blwf-before.png right-arrow.png telugu-blwf-after.png -geometry +0+0 -background transparent telugu-blwf.png\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-before.png --features=-half,-haln --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,25cc,0c4d,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-after.png --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,0c4d,200d\n\nmontage telugu-half-before.png right-arrow.png telugu-half-after.png -geometry +0+0 -background transparent telugu-half.png\n\n\n## 3.10 `pstf`\n\n> None found.\n\n\n## 3.12 `cjct`\n\n> None found.\n\n\n## 4.2 Pre-base matras\n\n> Not applicable.\n\n\n## 4.3 Reph position\n\n> No examples found; existing fonts seem not to incorporate Reph for\n> Telugu....\n\n\n## 5 `pres`\n\n> Note: Example from Noto Serif Telugu, but it looks like it should be\n> a `abvs` substitution instead....\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c\n\nmontage telugu-pres-before.png right-arrow.png telugu-pres-after.png -geometry +0+0 -background transparent telugu-pres.png\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c40\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c40\n\nmontage telugu-abvs-before.png right-arrow.png telugu-abvs-after.png -geometry +0+0 -background transparent telugu-abvs.png\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c46,0c56\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c46,0c56\n\nmontage telugu-blws-before.png right-arrow.png telugu-blws-after.png -geometry +0+0 -background transparent telugu-blws.png\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c2b,0c42\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c2b,0c42\n\nmontage telugu-psts-before.png right-arrow.png telugu-psts-after.png -geometry +0+0 -background transparent telugu-psts.png\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-before.png --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d\n\nmontage telugu-haln-before.png right-arrow.png telugu-haln-after.png -geometry +0+0 -background transparent telugu-haln.png\n\n\n## `abvm`\n\n> None found.\n\n\n## `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c62\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c62\n\nmontage telugu-blwm-before.png right-arrow.png telugu-blwm-after.png -geometry +0+0 -background transparent telugu-blwm.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/telugu/telugu-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-telugu.md](../../opentype-shaping-telugu.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n\n> Note: always use `--features=-init` in examples where the `init`\n> feature itself is not being explained.\n\n\n## 2.2 Matra decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c48\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c46,25cc,0c56\n\nsvg_stack.py --direction=h telugu-matra-decompose-before.svg right-arrow.svg telugu-matra-decompose-after.svg > telugu-matra-decompose.svg\n\n\n## 3.2 `nukt`\n\n> Note: Noto Serif Telugu implements this in a `blwm` feature.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-nukt-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c18,0c3c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-nukt-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c3c\n\nsvg_stack.py --direction=h telugu-nukt-before.svg right-arrow.svg telugu-nukt-after.svg > telugu-nukt.svg\n\n\n## 3.3 `akhn`\n\n### KSsa\n\n> Note: Noto Serif Telugu implements this as a `pres`+`blwf`\n> substitution for unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-before.svg --features=-blwf,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,25cc,0c4d,0c37\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,0c4d,0c37\n\nsvg_stack.py --direction=h telugu-akhn-kssa-before.svg right-arrow.svg telugu-akhn-kssa-after.svg > telugu-akhn-kssa.svg\n\n### JNya\n\n> None found. Microsoft docs reference a \"SsJa\" akhand form, which is\n> also not found.\n\n\n## 3.4 `rphf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-rphf-before.svg --features=-rphf,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --preserve-default-ignorables --unicodes=0c30,0c4d,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c30,0c4d,200d\n\nsvg_stack.py --direction=h telugu-rphf-before.svg right-arrow.svg telugu-rphf-after.svg > telugu-rphf.svg\n\n\n## 3.7 `blwf`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,25cc,0c4d,0c24\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,0c4d,0c24\n\nsvg_stack.py --direction=h telugu-blwf-before.svg right-arrow.svg telugu-blwf-after.svg > telugu-blwf.svg\n\n\n## 3.9 `half`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-before.svg --features=-half,-haln --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,25cc,0c4d,200d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,0c4d,200d\n\nsvg_stack.py --direction=h telugu-half-before.svg right-arrow.svg telugu-half-after.svg > telugu-half.svg\n\n\n## 3.10 `pstf`\n\n> None found.\n\n\n## 3.12 `cjct`\n\n> None found.\n\n\n## 4.2 Pre-base matras\n\n> Not applicable.\n\n\n## 4.3 Reph position\n\n> No examples found; existing fonts seem not to incorporate Reph for\n> Telugu....\n\n\n## 5 `pres`\n\n> Note: Example from Noto Serif Telugu, but it looks like it should be\n> a `abvs` substitution instead....\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c\n\nsvg_stack.py --direction=h telugu-pres-before.svg right-arrow.svg telugu-pres-after.svg > telugu-pres.svg\n\n\n## 5 `abvs`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c40\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c40\n\nsvg_stack.py --direction=h telugu-abvs-before.svg right-arrow.svg telugu-abvs-after.svg > telugu-abvs.svg\n\n\n## 5 `blws`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c4d,0c24,0c4d,0c30\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c4d,0c24,0c4d,0c30\n\nsvg_stack.py --direction=h telugu-blws-before.svg right-arrow.svg telugu-blws-after.svg > telugu-blws.svg\n\n\n## 5 `psts`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2b,0c42\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2b,0c42\n\nsvg_stack.py --direction=h telugu-psts-before.svg right-arrow.svg telugu-psts-after.svg > telugu-psts.svg\n\n\n## 5 `haln`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-before.svg --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d\n\nsvg_stack.py --direction=h telugu-haln-before.svg right-arrow.svg telugu-haln-after.svg > telugu-haln.svg\n\n\n## `dist`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c44\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c44\n\nsvg_stack.py --direction=h telugu-dist-before.svg right-arrow.svg telugu-dist-after.svg > telugu-dist.svg\n\n\n## `abvm`\n\n> Note: Noto Serif Telugu implements this in a `blwm` feature, for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=telugu-abvm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c0f,0c00\n\nhb-view --font-size=110 --margin=32,16,2,16 --output-file=telugu-abvm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c0f,0c00\n\nsvg_stack.py --direction=h telugu-abvm-before.svg right-arrow.svg telugu-abvm-after.svg > telugu-abvm.svg\n\n\n\n## `blwm`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c4d,0c26\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c4d,0c26\n\nsvg_stack.py --direction=h telugu-blwm-before.svg right-arrow.svg telugu-blwm-after.svg > telugu-blwm.svg\n\n\n\n"
  },
  {
    "path": "images/thai-lao/thai-lao-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-thai-lao.md](../../opentype-shaping-thai-lao.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 1.1 `ccmp`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d\n\nmontage thai-ccmp-before.png right-arrow.png thai-ccmp-after.png -geometry +0+0 -background transparent thai-ccmp.png\n\n## 1.2 Decomposition\n\n## 1.2 Am sign decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0eb3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0ecd,25cc,0eb2\n\nmontage lao-am-decomposition-before.png right-arrow.png lao-am-decomposition-after.png -geometry +0+0 -background transparent lao-am-decomposition.png\n\n## 4 `kern`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-before.png --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a\n\n hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-after.png --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a\n\nmontage lao-kern-before.png right-arrow.png lao-kern-after.png -geometry +0+0 -background transparent lao-kern.png\n\n## 4 `mark`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,25cc,0e38\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,0e38\n\nmontage thai-mark-before.png right-arrow.png thai-mark-after.png -geometry +0+0 -background transparent thai-mark.png\n\n\n## 4 `mkmk`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-before.png --features=-mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-after.png --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48\n\nmontage thai-mkmk-before.png right-arrow.png thai-mkmk-after.png -geometry +0+0 -background transparent thai-mkmk.png\n\n\n## PUA 1 - Sara Am decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e33\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4d,25cc,0e32\n\nmontage thai-am-decomposition-before.png right-arrow.png thai-am-decomposition-after.png -geometry +0+0 -background transparent thai-am-decomposition.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/thai-lao/thai-lao-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-thai-lao.md](../../opentype-shaping-thai-lao.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## 1.1 `ccmp`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d\n\nsvg_stack --direction=h thai-ccmp-before.svg right-arrow.svg thai-ccmp-after.svg > thai-ccmp.svg\n\n## 1.2 Decomposition\n\n## 1.2 Am sign decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0eb3\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0ecd,25cc,0eb2\n\nsvg_stack --direction=h lao-am-decomposition-before.svg right-arrow.svg lao-am-decomposition-after.svg > lao-am-decomposition.svg\n\n## 4 `kern`\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a\n\nsvg_stack --direction=h lao-kern-before.svg right-arrow.svg lao-kern-after.svg > lao-kern.svg\n\n## 4 `mark`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,25cc,0e38\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,0e38\n\nsvg_stack --direction=h thai-mark-before.svg right-arrow.svg thai-mark-after.svg > thai-mark.svg\n\n\n## 4 `mkmk`\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-before.svg --features=-mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48\n\nhb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-after.svg --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48\n\nsvg_stack --direction=h thai-mkmk-before.svg right-arrow.svg thai-mkmk-after.svg > thai-mkmk.svg\n\n\n## PUA 1 - Sara Am decomposition\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e33\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4d,25cc,0e32\n\nsvg_stack --direction=h thai-am-decomposition-before.svg right-arrow.svg thai-am-decomposition-after.svg > thai-am-decomposition.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/tibetan/tibetan-png-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-tibetan.md](../../opentype-shaping-tibetan.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Syllable identification\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-syllable.png --features=  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f54,0f7a,0f0b,0f56,0f66,0f90,0fb2,0f74,0f53\n\n\n## 1.2 ccmp\n\nhb-view --font-size=110 --margin=2,16,2,72 --output-file=tibetan-ccmp-before.png --features=-ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f77\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-ccmp-after.png --features=+ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0fb2,00a0,00a0,0f71,25cc,0f80\n\nmontage tibetan-ccmp-before.png right-arrow.png tibetan-ccmp-after.png -geometry +0+0 -background transparent tibetan-ccmp.png\n\n\n## 2.1 abvs\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-before.png --features=-abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-after.png --features=+abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e\n\nmontage tibetan-abvs-before.png right-arrow.png tibetan-abvs-after.png -geometry +0+0 -background transparent tibetan-abvs.png\n\n\n## 2.2 blws\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-before.png --features=-blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f4a,0f91\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-after.png --features=+blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f4a,0f91\n\nmontage tibetan-blws-before.png right-arrow.png tibetan-blws-after.png -geometry +0+0 -background transparent tibetan-blws.png\n\n\n## 2.3 calt\n\n> Note: Noto Sans Tibetan calls this substitution twice, in calt and\n> in abvs.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-before.png --features=-calt,-abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0f7d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-after.png --features=+calt  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0f7d\n\nmontage tibetan-calt-before.png right-arrow.png tibetan-calt-after.png -geometry +0+0 -background transparent tibetan-calt.png\n\n\n## 2.4 liga\n\n> Note: Noto Sans Tibetan implements this as a ccmp substitution for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-before.png --features=-ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-after.png --features=+ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7\n\nmontage tibetan-liga-before.png right-arrow.png tibetan-liga-after.png -geometry +0+0 -background transparent tibetan-liga.png\n\n\n## 3 kern\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-before.png --features=-kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-after.png --features=+kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9\n\nmontage tibetan-kern-before.png right-arrow.png tibetan-kern-after.png -geometry +0+0 -background transparent tibetan-kern.png\n\n\n## 3 abvm\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-before.png --features=-abvm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-after.png --features=+abvm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e\n\nmontage tibetan-abvm-before.png right-arrow.png tibetan-abvm-after.png -geometry +0+0 -background transparent tibetan-abvm.png\n\n\n## 3 blwm\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-before.png --features=-blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-after.png --features=+blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74\n\nmontage tibetan-blwm-before.png right-arrow.png tibetan-blwm-after.png -geometry +0+0 -background transparent tibetan-blwm.png\n\n\n## 3 mkmk\n\n> Note: Noto Sans Tibetan implements this is both blwm and mkmk for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-before.png --features=-mkmk,-blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f71,0f35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-after.png --features=+mkmk  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f71,0f35\n\nmontage tibetan-mkmk-before.png right-arrow.png tibetan-mkmk-after.png -geometry +0+0 -background transparent tibetan-mkmk.png\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "images/tibetan/tibetan-svg-image-generation-log.md",
    "content": "# Commands used to generate the images in [opentype-shaping-tibetan.md](../../opentype-shaping-tibetan.md)\n\n## Arrow general\n\nhb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192\n\n## Syllable identification\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-syllable.svg --features=  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f54,0f7a,0f0b,0f56,0f66,0f90,0fb2,0f74,0f53\n\n\n## 1.2 ccmp\n\nhb-view --font-size=110 --margin=2,16,2,72 --output-file=tibetan-ccmp-before.svg --features=-ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f77\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-ccmp-after.svg --features=+ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0fb2,00a0,00a0,0f71,25cc,0f80\n\nsvg_stack --direction=h tibetan-ccmp-before.svg right-arrow.svg tibetan-ccmp-after.svg > tibetan-ccmp.svg\n\n\n## 2.1 abvs\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-before.svg --features=-abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-after.svg --features=+abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e\n\nsvg_stack --direction=h tibetan-abvs-before.svg right-arrow.svg tibetan-abvs-after.svg > tibetan-abvs.svg\n\n\n## 2.2 blws\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-before.svg --features=-blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f4a,0f91\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-after.svg --features=+blws  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f4a,0f91\n\nsvg_stack --direction=h tibetan-blws-before.svg right-arrow.svg tibetan-blws-after.svg > tibetan-blws.svg\n\n\n## 2.3 calt\n\n> Note: Noto Sans Tibetan calls this substitution twice, in calt and\n> in abvs.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-before.svg --features=-calt,-abvs  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0f7d\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-after.svg --features=+calt  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0f7d\n\nsvg_stack --direction=h tibetan-calt-before.svg right-arrow.svg tibetan-calt-after.svg > tibetan-calt.svg\n\n\n## 2.4 liga\n\n> Note: Noto Sans Tibetan implements this as a ccmp substitution for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-before.svg --features=-ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-after.svg --features=+ccmp  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7\n\nsvg_stack --direction=h tibetan-liga-before.svg right-arrow.svg tibetan-liga-after.svg > tibetan-liga.svg\n\n\n## 3 kern\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-before.svg --features=-kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-after.svg --features=+kern  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9\n\nsvg_stack --direction=h tibetan-kern-before.svg right-arrow.svg tibetan-kern-after.svg > tibetan-kern.svg\n\n\n## 3 abvm\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-before.svg --features=-abvm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-after.svg --features=+abvm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e\n\nsvg_stack --direction=h tibetan-abvm-before.svg right-arrow.svg tibetan-abvm-after.svg > tibetan-abvm.svg\n\n\n## 3 blwm\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-before.svg --features=-blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-after.svg --features=+blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74\n\nsvg_stack --direction=h tibetan-blwm-before.svg right-arrow.svg tibetan-blwm-after.svg > tibetan-blwm.svg\n\n\n## 3 mkmk\n\n> Note: Noto Sans Tibetan implements this is both blwm and mkmk for\n> unknown reasons.\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-before.svg --features=-mkmk,-blwm  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f71,0f35\n\nhb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-after.svg --features=+mkmk  --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f71,0f35\n\nsvg_stack --direction=h tibetan-mkmk-before.svg right-arrow.svg tibetan-mkmk-after.svg > tibetan-mkmk.svg\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "index.md",
    "content": "```{include} /_global.md\n```\n\n# OpenType shaping documents #\n\nSponsored by [YesLogic](https://yeslogic.com/) \n\n_<aside>Thanks also to the developers of HarfBuzz and AllSorts, plus many other font engineers and text-encoding experts for their generosity of time and insightful contributions.</aside>_\n\n:::{admonition} &#127366; &#127344; &#127361; &#127357; &#127352; &#127357; &#127350;\n:class: caution\nThese documents are an active WORK IN PROGRESS.\n\nNONE of the documents you currently see here are complete\nnor are they suitable for reference. PLEASE do not use\nthem as a guide or as a general information source.\n\nAs long as this warning text remains visible, the above \nholds true. \n:::\n\n\nThese documents are meant to provide a functional specification for\ntext shaping. The expectation is that an implementer of this\nspecification will be using fonts in the OpenType font format applied\nto input text that complies with Unicode.\n\nBecause application software and end-user documents may utilize\nnon-OpenType fonts and non-Unicode text (in particular, when older\nfonts or documents are encountered), these documents also provide\nfunctional information that a shaping engine may use to implement a\nreasonable best-effort attempt at producing useful output in the most\ncommon of such scenarios.\n\n\n## Shapers\n\nThe shaping behavior described here can be roughly divided into five\ncategories.\n\nAll non-complex scripts follow the same\n[default](opentype-shaping-default.md) shaping model.\n\n\nThe _Indic Model_ is shared by ten individual scripts. These scripts\nfollow the same overall approach to shaping, described in the [Indic\ngeneral](opentype-shaping-indic-general.md) document, but each script\nincorporates script-specific details, which are more fully described\nin its own document:\n\n  - [Devanagari](opentype-shaping-devanagari.md)\n  - [Bengali](opentype-shaping-bengali.md)\n  - [Gujarati](opentype-shaping-gujarati.md)\n  - [Gurmukhi](opentype-shaping-gurmukhi.md)\n  - [Kannada](opentype-shaping-kannada.md)\n  - [Malayalam](opentype-shaping-malayalam.md)\n  - [Oriya](opentype-shaping-oriya.md)\n  - [Tamil](opentype-shaping-tamil.md)\n  - [Telugu](opentype-shaping-telugu.md)\n  - [Sinhala](opentype-shaping-sinhala.md)\n\n\nThe _Arabic Model_ is shared by four individual scripts. These scripts\nfollow the same overall approach to shaping, described in the [Arabic\ngeneral](opentype-shaping-arabic-general.md) document, but each script\nincorporates script-specific details, which are more fully described\nin its own document:\n\n  - [Arabic](opentype-shaping-arabic.md)\n  - [N'Ko](opentype-shaping-nko.md)\n  - [Syriac](opentype-shaping-syriac.md)\n  - [Mongolian](opentype-shaping-mongolian.md)\n\n\nFive of the remaining scripts each use a distinct, script-specific\nmodel, with two others (Thai and Lao) sharing enough details to be\nhandled by a common shaper:\n\n  - [Hangul](opentype-shaping-hangul.md)\n  - [Hebrew](opentype-shaping-hebrew.md)\n  - [Khmer](opentype-shaping-khmer.md)\n  - [Thai and Lao](opentype-shaping-thai-lao.md)\n  - [Tibetan](opentype-shaping-tibetan.md)\n  - [Myanmar](opentype-shaping-myanmar.md)\n  \n\nFinally, the Universal Shaping Engine (<abbr title=\"Universal Shaping\nEngine\">USE</abbr>) model is designed to shape all\ncomplex scripts that are not handled by a dedicated\nscript-specific shaping model in the lists above:\n\n  - [Universal Shaping Engine (<abbr>USE</abbr>)](opentype-shaping-use.md)\n\n\nIn addition, these documents describe the handling of emoji\nsequences. Although emoji sequences do not constitute a separate\nshaping model, handling emoji sequences can incorporate many of the\nsame shaping mechanisms and shaping engine implementations may be\nexpected to handle them:\n\n  - [Emoji](opentype-shaping-emoji.md)\n  \n\nShaping is just one part of the overall text-handling process. These\ndocuments assume that other components in the software stack will be\nresponsible for details such as handling higher-level markup, layout,\nfont matching and loading, rasterization, and so on. Most importantly,\nthese documents assume that the input text has already been segmented\ninto text runs that consist of a single language, script, font, and\nall other markup considerations (such as size or color, for example).\n\nWithin those assumptions, the shaping of a particular text run should\nbe consistent, regardless of whether the higher-level processes\ninvolve a document, user-interface element, network stream, or any\nother context for displaying text.\n\n\n## Normalization\n\nHowever, these documents also include a description of text\n[normalization](opentype-shaping-normalization.md) in the OpenType\nshaping context, which differs from Unicode normalization in several\nrespects. Shaping engine implementations may differ as to whether the\nshaping engine itself is responsible for handling normalization or\nwhether normalization is handled by another component\nin the stack. \n\n\n## Additional information\n\nVarious practical [notes](notes/index.md) about this document set and\nthe details of its scope, limitations, and quirks are also provided.\n\nSome [errata](errata.md) about the \"upstream\" specifications and\nreference documents are noted separately. \n\nIn its final form, this repository will hold documentation describing\nthe shaping behavior used for layout of OpenType text. In particular,\nit will focus on complex scripts.\n\nIn addition to the primary, per-script documents, implementers and\nother interested readers are encouraged to check the\n[character tables](character-tables/index.md) for correctness and to\nexamine the [image-generation logs](https://github.com/n8willis/opentype-shaping-documents/images/README.md) to identify\nissues seen in the inline images.\n\n\n## Feedback\n\nInterested readers, font developers, and shaping-engine implementers\nare encouraged to provide feedback, ask questions, and propose\nimprovements to any part of these documents. Shaping is the concern of\nsoftware developers and readers across the world, and all are welcome\nto participate in recording and clarifying what is required to produce\nthe best and most accurate text output possible, both now and in the\nfuture.\n\nSee the upstream git repository at\n[github.com/n8willis/opentype-shaping-documents](https://github.com/n8willis/opentype-shaping-documents)\nto raise issues, ask questions, or add comments.\n\n\n## References\n\nThese documents cite the following informative references:\n\n1. The Microsoft [Script development\n   specifications](https://docs.microsoft.com/en-us/typography/script-development/standard),\n   which document the behaviors expected for OpenType Layout fonts and\n   provide guidance &amp; examples for type designers. OpenType is a\n   registered trademark of Microsoft Corporation. \n2. Related portions of the Microsoft OpenType specification, such as the\n   [OpenType Layout tag\n   registry](https://docs.microsoft.com/en-us/typography/opentype/spec/ttoreg)\n   and [OpenType Layout common table\n   formats](https://docs.microsoft.com/en-us/typography/opentype/spec/chapter2),\n   which list and define feature tags, script &amp; language tags, and\n   other internals of compliant OpenType font binaries. OpenType is a\n   registered trademark of Microsoft Corporation. \n3. The [HarfBuzz](https://github.com/harfbuzz/harfbuzz) project, which\n   includes a free-software/open-source implementation of OpenType\n   Layout shaping with full source code and documentation. \n4. The [AllSorts](https://github.com/yeslogic/allsorts) project, which\n   includes a free-software/open-source implementation of OpenType\n   Layout shaping with full source code and documentation.\n5. The [Unicode\n   Standard](http://www.unicode.org/standard/standard.html) and\n   related Unicode Consortium projects such as the [Unicode Character\n   Database](http://www.unicode.org/reports/tr44/), which defines\n   Unicode code points and formal character properties used in\n   shaping. Unicode and the Unicode Logo are registered trademarks of\n   Unicode, Inc. in the United States and other countries.\n6. The YesLogic [text corpus](https://github.com/yeslogic/corpus),\n   which includes real-world text data for several Indic scripts,\n   scraped from Wikipedia, Reddit, and multiple online news\n   sources. This data is used to test shaping in AllSorts and Prince.\n7. Known but unofficial information about other shaping-engine\n   projects. Primarily this includes tests and reproducible issues\n   found via [HarfBuzz](https://github.com/harfbuzz/harfbuzz), because\n   HarfBuzz intentionally aims to produce results that will 100% match\n   the output of Microsoft Uniscribe (not counting cases where\n   Uniscribe's output is known to be incorrect, of course).\n   > Note: occasionally, tests or issues documenting the behavior of\n   > Apple CoreText are also included, but CoreText compatibility is\n   > not an explicit goal for HarfBuzz.\n   \n\n---\nVersion {{ env.config.version }}, release {{ env.config.release }};\nbuilt {sub-ref}`today`.\n"
  },
  {
    "path": "make.bat",
    "content": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sphinx-build\r\n)\r\nset SOURCEDIR=.\r\nset BUILDDIR=_build\r\n\r\n%SPHINXBUILD% >NUL 2>NUL\r\nif errorlevel 9009 (\r\n\techo.\r\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\r\n\techo.installed, then set the SPHINXBUILD environment variable to point\r\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\r\n\techo.may add the Sphinx directory to PATH.\r\n\techo.\r\n\techo.If you don't have Sphinx installed, grab it from\r\n\techo.https://www.sphinx-doc.org/\r\n\texit /b 1\r\n)\r\n\r\nif \"%1\" == \"\" goto help\r\n\r\n%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\ngoto end\r\n\r\n:help\r\n%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\n\r\n:end\r\npopd\r\n"
  },
  {
    "path": "notes/README.md",
    "content": "# Notes #\n\nThe files in this directory include auxiliary information that is either\ntangential to the main shaping-behavior documentation or is\nexcessively long enough that trying to include it inline would disrupt\nthe flow of the text for readers.\n\nNotes included cover:\n\n  - [Uniscribe compatibility](/notes/uniscribe-bug-compatibility.md):\n    Information on preserving strict compatibility with Microsoft's\n    Uniscribe shaping engine\n  - [Ragel state-machine operators](/notes/ragel-machine-notation.md):\n    Information on the syntax of the\n    [Ragel](http://www.colm.net/open-source/ragel/) state-machine\n    compiler, which is the reference regular-expression syntax used\n    when listing regular expressions in the shaping-behavior\n    documentation, but which itself is not mandatory.\n  - [Emoji implementation](/notes/emoji-implementation.md): Information\n    on the image formats, codepoint visibility, and <abbr title=\"Glyph Substitution table\">GSUB</abbr>/<abbr title=\"Glyph Positioning table\">GPOS</abbr> features\n    used in real-world Emoji fonts distributed by major vendors.\n"
  },
  {
    "path": "notes/emoji-implementation.md",
    "content": "# Notes on Emoji font implementation #\n\nThis document notes details on how common Emoji fonts implement\nsequence, modifier, varation-selection, text-presentation \nfallback, and other behavior, for the purposes of testing and\ndebugging.\n\nEmoji fonts are deployed by vendors using a variety of different\nimage formats (including the `SVG `, `COLR`v0/`CPAL`,\n`COLR`v1/`CPAL`, `glyf`, and `cff ` vector formats and the `CBDT`\nand `sbix` raster formats), which can make it difficult to\ncharacterize Emoji font behavior.\n\nSimilarly, Emoji font vendors have employed a variety of\ndifferent OpenType features to implement support for standard\nsequences, modifier-based sequences, <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>-based sequences and\npermutations.\n\nSee the [Emoji shaping document](../opentype-shaping-emoji.md)\nfor more details on the sequences and definitions involved.\n\n## Format, features, and control-codepoint visiblity table ##\n\nThis table lists the image format, the <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature(s) used for\nbasic Emoji sequence support and <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>-based sequence support, and\nwhether or not the font includes a visible glyph for the\npresentation selector codepoints (VS15, `U+FE0E`; VS16, `U+FE0F`)\nand modifier codepoints (`U+1F3FB`..`U+1F3FF`).\n\n\n:::{table} Emoji sequence implementation details\n\n| Font                   | publisher | image format | sequence formation feature | ZWJ sequence feature | visible presentation selector | visible modifier |\n|:-----------------------|:----------|:-------------|:---------------------------|:---------------------|:------------------------------|:-----------------|\n| Source Emoji           | Adobe     | cff          | ccmp                       | ccmp, salt           | YES                           | YES              |\n| Blobmoji               | C1710     | CBDT         | ccmp                       | ccmp                 | no                            | YES              |\n| Twemoji                | Twitter   | SVG          | liga                       | liga                 | no                            | YES              |\n| Noto Color Emoji       | Google    | CBDT         | ccmp                       | ccmp                 | no                            | YES              |\n| Noto Color Emoji       | Google    | COLRv1       | ccmp                       | ccmp                 | no                            | YES              |\n| EmojiTwo Android       | EmojiTwo  | CBDT         | ccmp                       | ccmp                 | no                            | YES              |\n| EmojiTwo Apple         | EmojiTwo  | sbix         | morx                       | morx                 | no                            | YES              |\n| EmojiTwo SVG          | EmojiTwo  | SVG          | ccmp                       | ccmp                 | no                            | YES              |\n| Openmoji               | HfG Gmünd | SVG          | liga                       | liga                 | no                            | YES              |\n| FirefoxEmoji           | Mozilla   | COLRv0       | rlig                       | rlig                 | no                            | no               |\n| Noto Emoji             | Google    | glyf         | ccmp                       | ccmp                 | no                            | YES              |\n| Old Noto B&amp;W Emoji | Google    | glyf         | ccmp                       | ccmp                 | no                            | no               |\n| JoyPixels              | JoyPixels | CBDT         | ccmp                       | ccmp                 | no                            | YES              |\n| Apple Color Emoji      | Apple    | sbix         | morx                       | morx                 | no                            | YES              |\n| Samsung Color Emoji    | Samsung  | CBDT         | ccmp                       | ccmp                 | no                            | YES              |\n| Segoe UI Emoji         | Microsoft| COLRv0       | ccmp                       | ccmp                 | YES                           | YES              |\n:::\n\n\n### Contributing additional data ###\n\nVolunteers or implementers who wish to contribute data for additional\nEmoji fonts may need to collecting the information themselves by\ninspecting font binaries.\n\nOptions available include:\n\n1. **FontTools / TTX**\n   - Users can run `ttx -l somefontfilename.ttf` (or `.otf` or `.ttc`\n     or `.otc`) to get a short list of the tables. The presence of `SVG `,\n     `CBDT`, `sbix`, or `COLR` indicates that whichever one of those exists\n     is the image format. _If_ none of the above are there but `glyf` or `CFF `\n     or `CFF2` _is_ there, then whichever of those three exists is the\n     image format (and means it's a black-and-white emoji font, which users\n     would probably know beforehand anyway). If there's more than one of\n     `SVG `, `CBDT`, `sbix`, or `COLR` present in the same font file, that\n     would likely mean unknown behavior; comments on such cases are welcome.\n   - Users can run the `layout-features.py somefontfilename.ttf`\n     script (which can be found in the `/Snippets/` directory of the\n     `FontTools` package source) and it will print out an indented\n     list of the <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\n     used. All that matters for the table above is what the script\n     reports on the `Feature: ` line. For a typical emoji font there's\n     probably only one feature -- but, if there are several, listing\n     them is useful.\n\n2. **AllSorts / allsorts-tools**\n   - Users can use the `dump` tool from the `allsorts-tools` package\n     to run `allsorts dump somefilename.ttf` and get a list of tables plus\n     other metadata; the tables are the first output. Same interpretation\n     as above.\n   - At the moment it sounds like there isn't a single-command option in\n     `allsorts` to list <abbr title=\"Glyph Substitution table\">GSUB</abbr>/<abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Corrections are welcome.\n\n3. **GUI font editors**\n   - Users can also open up the font file in a font editor and look at what\n     it presents. \n   - FontForge:\n     - In FontForge, go to Element -> Font Info in the menu to open the\n       font-info dialog box. It will show the <abbr title=\"Glyph Substitution table\">GSUB</abbr>/<abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups in the\n       \"Lookups\" tab (left-hand side).\n     - FontForge does _not_ just show a convenient list of all the tables.\n       However, when users open the font file, the \"Warnings\" dialog box will\n       report if it finds `SVG `, `CBDT`, `sbix`, or `COLR` tables.\n       Unfortunately, it will only actually open the font for\n       editing/inspection if it finds a `glyf`, `CFF `, or `CFF2` table\n       (which a `COLR` font would have) or an `SVG ` table. So users can't\n       use it to inspect the features of the other formats.\n\n(Further instructions will be added to this list for other editors if volunteers\ncan contribute them)\n\nFor determining if there's a printable glyph for the selectors/modifiers:\n1. **GUI font editors**\n   - Users can open up the font in an editor and look at the slots for the\n     Unicode codepoints for the presentation selectors (`U+FE0E` and `U+FE0F`)\n     and the modifiers (`U+1F3FB` through `U+1F3FF`), if they exist (they might not).\n2. **HarfBuzz**\n   - Users can run the `hb-view` utility to output glyph contents for specific\n     Unicode codepoints, but one might have to try a couple of options, depending\n     on the image format. Run `hb-view --preserve-default-ignorables somefontfilename.ttf --unicodes=fe0e`\n     to start (for `U+FE0E`). Users may also try adding the `--font-funcs=ot`\n     and/or `--shapers=ot` flags to that command if it gives trouble. \n"
  },
  {
    "path": "notes/index.md",
    "content": "# Notes #\n\nThis section includes auxiliary information that is either\ntangential to the main shaping-behavior documentation or is\nexcessively long enough that trying to include it inline would disrupt\nthe flow of the text for readers.\n\nNotes included cover:\n\n  - [Uniscribe compatibility](/notes/uniscribe-bug-compatibility.md):\n    Information on preserving strict compatibility with Microsoft's\n    Uniscribe shaping engine\n  - [Ragel state-machine operators](/notes/ragel-machine-notation.md):\n    Information on the syntax of the\n    [Ragel](http://www.colm.net/open-source/ragel/) state-machine\n    compiler, which is the reference regular-expression syntax used\n    when listing regular expressions in the shaping-behavior\n    documentation, but which itself is not mandatory.\n  - [Emoji implementation](/notes/emoji-implementation.md): Information\n    on the image formats, codepoint visibility, and <abbr title=\"Glyph Substitution table\">GSUB</abbr>/<abbr title=\"Glyph Positioning table\">GPOS</abbr> features\n    used in real-world Emoji fonts distributed by major vendors.\n"
  },
  {
    "path": "notes/ragel-machine-notation.md",
    "content": "# Ragel State Machine operators #\n\nAs used in the regular expressions cited in various shaper-engine\nguides.\n\n```\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n"
  },
  {
    "path": "notes/uniscribe-bug-compatibility.md",
    "content": "# Notes for preserving Uniscribe compatibility #\n\nThis document details behavior that shaping engines may wish to\nimplement in order to maintain strict shaping compatibility with\nMicrosoft's Uniscribe OpenType shaper, including behavior that may be\nregarded as bugs by end users.\n\n**Contents**\n\n  - [Indic standalone-syllable dotted circles](#indic-standalone-syllable-dotted-circles)\n  - [Indic syllable cluster merging](#indic-syllable-cluster-merging)\n  - [Indic fallback Reph reordering](#indic-fallback-reph-reordering)\n  - [Kannada legacy treatment of \"Ra,Halant,ZWJ\"](#kannada-legacy-treatment-of-ra-halant-zwj)\n  - [Khmer kerning](#khmer-kerning)\n  - [Sinhala matra decomposition](#sinhala-matra-decomposition)\n  - [Miscellaneous](#miscellaneous)\n      - [Bengali init feature matching](#bengali-init-feature-matching)\n      - [Old-model post-base Halant reordering](#old-model-post-base-halant-reordering)\n          - [Kannada final double Halants](#kannada-final-double-halants)\n      - [Halants and left matras](#halants-and-left-matras)\n\t  - [Explicit half-forms followed by matras](#explicit-half-forms-followed-by-matras)\n\n\nCompatibility notes in the \"miscellaneous\" category deal with\nbehaviors that are incompletely documented, deal solely with\ndeprecated script tags, or do not violate known conventions. Thus, the\nscenarios with which they deal may not be regarded as bugs.\n\n\n\n## Indic standalone-syllable dotted circles ##\n\nIn Indic syllables that include a `PLACEHOLDER` or `DOTTED_CIRCLE`\ncodepoint, if a dotted-circle glyph is the last consonant of the\nsyllable, Uniscribe ignores the glyph when processing the syllable.\n\nFor example, the dotted-circle glyph is not counted as a consonant\nwhen locating the syllable's base consonant. Therefore, the sequence\n<samp>\"Ra,Halant,Dotted_Circle\"</samp> does not trigger Reph formation (which would\nresult in the sequence <samp>\"Reph,Dotted_Circle\"</samp>).\n\n\n## Indic syllable cluster merging ##\n\nOther shaping engines, such as HarfBuzz, track the indivisible\ncomponents of a syllable in \"clusters\". Each individual letter usually\ncorresponds to a cluster; when two letters ligate or form a conjunct,\ntheir clusters are merged. When a codepoint is decomposed, its\ncomponents remain part of the same, original cluster as the\nprecomposed version. Uniscribe appears to follow this pattern as well.\n\nWhen shaping Indic text in most scripts, after shaping the entire\nsyllable, Uniscribe merges all of the clusters of the syllable into a\nsingle, indivisible cluster. \n\nThe exceptions to this behavior occur when Uniscribe is shaping Tamil\nand Sinhala. In those cases, the full-syllable cluster merge is not\nperformed.\n\n> Note: This full-syllable clustering makes it hard for application\n> software to position the cursor within the word. It may also have\n> other implications for software above the shaping engine in the\n> stack.\n\n\n## Indic fallback Reph reordering ##\n\nWhen shaping Indic syllables, any one of several Reph-positioning\nstrategies may be required by the active script. In the event that no\ncorrect position can be determined by the shaping engine for a\nsyllable, Uniscribe's ultimate fallback behavior is to reorder the\nReph to the end of the syllable.\n\nIf the Reph is reordered to the end of the syllable and this final\nposition happens to occur immediately after a <samp>\"Matra,Halant\"</samp> sequence,\nUniscribe leaves the Reph in this position.\n\nOther shaping engines, in this situation, will reorder the Reph to a\nposition immediately before the <samp>\"Matra,Halant\"</samp> sequence. This allows\nfor any <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions that match <samp>\"Reph,Matra\"</samp> sequences to be\nactivated, if any such substitution rules are present in the active\nfont. \n\n## Kannada legacy treatment of \"Ra,Halant,ZWJ\" ##\n\nIn the `<knda>` shaping model (which was deprecated in 2005 in favor\nof `<knd2>`), the sequence <samp>\"Ra,Halant,ZWJ\"</samp> was treated as equivalent\nto the sequence <samp>\"Ra,ZWJ,Halant\"</samp>.\n\n## Khmer kerning ##\n\nUniscribe does not apply the `kern` feature to Khmer text, even if the\nactive font includes kerning tables for Khmer codepoints.\n\n\n## Sinhala matra decomposition ##\n\nSinhala text in OpenType presents two possible methods for\ndecomposing multi-part matras. \n\nOne is the canonical Unicode decompositions for the matra codepoints,\nas is used in most other Indic scripts. This decomposition is usually\nperformed early in the shaping process.\n\nThe second is the `pstf` feature of <abbr title=\"Glyph Substitution table\">GSUB</abbr>, which is defined differently\nfor Sinhala. In Sinhala, the `pstf` feature replaces multi-part\ndependent vowels (matras) with the right-side matra component of the\ncanonical decomposition. This substitution generally occurs late in\nthe shaping process.\n\nUniscribe supports the `pstf` behavior by handling the decomposition\nof multi-part dependent vowels differently for Sinhala -- in a sense,\ndecomposing each matra into its left-side component followed by a\nduplicate of the original matra, then substituting the duplicated\nmatra with the right-side matra component when the `pstf` feature is\napplied.\n\nShaping engines may opt to decompose multi-part dependent\nvowels into their canonical Unicode decompositions, as is done in\nother scripts, and substitute the decomposed right-side matra\ncomponents at that point.\n \nDoing so will negate the need to apply the `pstf` substitution.\nHowever, fonts that were engineered to support the\nUniscribe-supported behavior might not include <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning\nrules for the right-side matra components, relying instead on the\n`pstf` substitution to provide a suitable replacement.\n\n\n\n## Miscellaneous ##\n\n\n### Bengali `init` feature matching ###\n\nThe `init` feature in Bengali is defined in the OpenType specification\nas applying to word-initial left-side dependent vowels (matras).\nHowever, Uniscribe specifically applies the feature whenever\nthe matra is preceded by any character that falls within the following\nrange in the Unicode `General Category` property:\n\n- `GENERAL_CATEGORY_FORMAT` [Cf]\n- `GENERAL_CATEGORY_UNASSIGNED` [Cn]\n- `GENERAL_CATEGORY_PRIVATE_USE` [Co]\n- `GENERAL_CATEGORY_SURROGATE` [Cs]\n- `GENERAL_CATEGORY_LOWERCASE_LETTER` [Ll]\n- `GENERAL_CATEGORY_MODIFIER_LETTER` [Lm]\n- `GENERAL_CATEGORY_OTHER_LETTER` [Lo]\n- `GENERAL_CATEGORY_TITLECASE_LETTER` [Lt]\n- `GENERAL_CATEGORY_UPPERCASE_LETTER` [Lu]\n- `GENERAL_CATEGORY_SPACING_MARK` [Mc]\n- `GENERAL_CATEGORY_ENCLOSING_MARK` [Me]\n- `GENERAL_CATEGORY_NON_SPACING_MARK` [Mn]\n\n\n### Old-model post-base Halant reordering ###\n\nIn old-model (Indic1) script tags, Uniscribe treats some\nscripts differently when reordering the first post-base <samp>\"Halant\"</samp>. This\nHalant-reordering is done in Indic1 scripts in order to prepare the\nsyllable for Indic1's different post-base <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution rules.\n\nFor example, the old-model Indic syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn Malayalam, Uniscribe always reorders the first post-base <samp>\"Halant\"</samp> in\na syllable to the position immediately after the syllable's last consonant.\n\n#### Kannada final double Halants ####\n\nIn old-model Kannada (`<knda>`) runs, Uniscribe is known to reorder\nthe first post-base <samp>\"Halant\"</samp> only when there is not already a <samp>\"Halant\"</samp>\nafter the last consonant.\n\nFor example, the old-model Indic syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC Halant\n\nwould _not_ be reordered. \n\nThis behavior is an exception to the general Indic1 post-base <samp>\"Halant\"</samp>\nreordering operation. It is believed to be script-specific and has\nonly been observed for Kannada text runs. However, there may still be\nundiscovered sequences in other Indic1-script text which trigger the\nsame behavior; implementers targeting full compatibility should\nexercise caution.\n\nIf the standard post-base <samp>\"Halant\"</samp> reordering were performed, then the\nlikely result of the <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature-application phase would be a\nsequence of the form <samp>\"BaseC,belowbaseC,Halant\"</samp> which, in turn, might\ntrigger mark-attachment issues for correctly positioning the final\n<samp>\"Halant\"</samp>.\n\nThis Uniscribe behavior is not documented, however; therefore the only\nrecommended workaround for maintaining compatibility is to define a\nspecial-case exception for avoiding the creation of final double\n<samp>\"Halant\"</samp>s in `<knda>` text.\n\n\n### Halants and left matras ###\n\nWhen reordering left-side matras, when a <samp>\"Halant\"</samp> occurs immediately\nafter a left-side matra, Uniscribe does not move the <samp>\"Halant\"</samp> with the matra.\n\nGenerally, marks (including <samp>\"Halant\"</samp>) are tagged for reordering with\nthe same positioning tag as the closest non-mark character that the\nmark has affinity with. \n\nIn post-base position, where a yet-to-be-reordered left-side matra\nwould be found, the closest non-mark character with affinity for the\nmark might be a post-base consonant. Uniscribe appears to make a check\nensuring that the <samp>\"Halant\"</samp> after a left-side matra is not tagged for\nreordering with the matra.\n\nThis check is required for shaping Sinhala, because the `U+0DDA`\nmulti-part matra decomposes into the sequence <samp>\"`U+0DD9`,Halant\"</samp>. The\ndecomposed <samp>\"Halant\"</samp> should remain where it is, serving as the right-side\nmatra component.\n\n\n### Explicit half-forms followed by matras ###\n\nAs a general rule, Uniscribe and other shapers insert a dotted-circle\ncharacter before a non-spacing mark character (such as a matra in\nIndic2-model scripts) when that non-spacing mark character is not\nmatched with a base character in a permitted syllable. In such\ncircumstances, the dotted-circle visually serves to communicate to\nreaders that a base character has not been found, and also\nfunctionally serves as a surrogate base on which the mark character\ncan be positioned.\n\nHowever, Uniscribe is known not to insert a dotted-circle before a\nmatra character when it is preceded by two sequential\nexplicit-half-form sequences (meaning two consecutive occurrences of\n<samp>\"_Consonant_,Halant,ZWJ\"</samp>) in Indic2 runs.\n\nTherefore, the sequence:\n\n    `_Consonant_,Halant,ZWJ,_matra_`\n\nwould be transformed to:\n\n    `_Consonant_,Halant,ZWJ,Dotted-Circle,_matra_`\n\nbut the sequence:\n\n    `_Consonant_,Halant,ZWJ,_Consonant_,Halant,ZWJ,_matra_`\n\nwould _not_ be transformed with a dotted-circle insertion.\n\nThis exception is regarded as a likely bug.\n"
  },
  {
    "path": "opentype-shaping-arabic-general.md",
    "content": "# Arabic-style shaping in OpenType #\n\nThis document details the general shaping procedure shared by Arabic, N'Ko,\nSyriac, and Mongolian. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Joining properties](#joining-properties)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The general Arabic-based shaping model](#the-general-arabic-based-shaping-model)\n      - [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)\n      - [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)\n      - [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)\n      - [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)\n      - [Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-5-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 7: Applying the positioning features from <abbr>GPOS</abbr>](#stage-7-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nSeveral scripts can be supported by the general OpenType shaping model\nused for Arabic. These writing systems observe similar rules and\nconventions, even if they are not historically related to\nArabic. Therefore, OpenType defines many of the same <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\nfeatures as supported for the corresponding script tags. These scripts include:\n\n  - [Arabic](opentype-shaping-arabic.md)\n  - [N'Ko](opentype-shaping-nko.md)\n  - [Syriac](opentype-shaping-syriac.md)\n  - [Mongolian](opentype-shaping-mongolian.md)\n\nThe information found below is intended to serve as a general guide;\nscript-specific information can be found in the linked document for\neach script.\n\nEach of these writing systems uses a joining script that uses\ninter-word spaces. Therefore, each codepoint in a text run may be\nsubstituted with one of several contextual forms corresponding to\nwhat, if any, characters appear before and after the codepoint. Most,\nbut not all, letter sequences join; shaping engines must track which\npositions trigger joining behavior for each letter. \n\nArabic, N'Ko, and Syriac are written (and, therefore, rendered) from right to\nleft. Mongolian is written vertically, from top to bottom. Shaping\nengines must track the directionality of the text run when scripts of\ndifferent direction are mixed.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nsupported scripts. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a\ncharacter that is capable of taking a diacritical mark. \n\n**Kashida** (or **tatweel**) is the term for a glyph inserted into a\nsequence for the purpose of elongating the baseline stroke of a\nletter. Unicode documents use the term \"tatweel\" most frequently,\nwhile OpenType documents use the term \"kashida\" most\nfrequently. Kashidas are typically inserted in order to justify lines\nof text. \n\n\n\n## Glyph classification ##\n\nIn joining (or cursive) scripts, proper shaping of\ntext runs involves identifying the joining behavior of each character,\nthen combining that information with any preceding or subsequent\ncharacters to determine the contextually correct form for display.\n\n### Joining properties ###\n\nCharacters are assigned a `JOINING_TYPE` property in the\nUnicode standard that indicates how they join to adjacent\ncharacters. There are six possible values: \n\n  - `JOINING_TYPE_LEFT` indicates that a character joins with\n    the subsequent character, but does not join with the preceding\n    character. \n\t\n  - `JOINING_TYPE_RIGHT` indicates that a character joins with the\n    preceding character, but does not join with the subsequent character.\t\n\n  - `JOINING_TYPE_DUAL` indicates that a character joins with the\n    preceding character and joins with the subsequent character.\n\t\n  - `JOINING_TYPE_NON_JOINING` indicates that a character does not\n    join with the preceding or with the subsequent character.\n\t\n  - `JOINING_TYPE_TRANSPARENT` indicates that the character does not\n    join with adjacent characters _and_ that the character must be\n    skipped over when the shaping engine is evaluating the joining\n    positions in a sequence of characters. When a\n    `JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,\n    the `JOINING_TYPE` of the preceding character passes\n    through. Diacritical marks are frequently assigned this value. \n\t\n  - `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces\n    the use of joining forms with the preceding and subsequent\n    characters. Kashidas and the Zero Width Joiner (`U+200D`) are both\n    `JOIN_CAUSING` characters.\n  \n\nIn some scripts (such as Arabic and Syriac), letters are also assigned\nto a `JOINING_GROUP` that indicates which fundamental character they\nbehave like with regard to joining behavior. Each of the basic letters\nin the script typically belongs to its own `JOINING_GROUP`, while\nsupplemental and accented letters are usually assigned to the\n`JOINING_GROUP` that corresponds to the underlying base letter, with\nno diacritics or other marks. \n\nFor example, the Persian letter \"Peh\" (`U+067E`) is visually\nrepresented as the Arabic letter \"Beh\" (`U+0628`), but with two additional\nbelow-base \"ijam\" marks. Consequently, \"Peh\" is assigned to the `BEH`\n`JOINING_GROUP`.\n\nMongolian and N'Ko, notably, do not make use of joining groups. Every\nletter in these scripts belongs to the _null_ or `NO_JOINING_GROUP`\ngroup.\n\n\n### Mark classification ###\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nThe marks in most scripts belong to the standard combining\nclasses. For example:\n\n:::{table} Example mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|`U+064B`   | 27              | &#x064B; Fathatan / Open fathatan  |\n|`U+064C`   | 28              | &#x064C; Dammatan / Open dammatan  |\n|`U+064D`   | 29              | &#x064D; Kasratan / Open Kasratan  |\n|`U+064E`   | 30              | &#x064E; Fatha / Small fatha       |\n|`U+064F`   | 31              | &#x064F; Damma / Small damma       |\n|`U+0650`   | 32              | &#x0650; Kasra / Small kasra       |\n|`U+0651`   | 33              | &#x0651; Shadda                    |\n|`U+0652`   | 34              | &#x0652; Sukun                     |\n|`U+0670`   | 35              | &#x0670; Superscript Alef          |\n|           | 220             | Other below-base combining marks   |\n|           | 230             | Other above-base combining marks   |\n:::\n\n\nThe numeric values of these combining classes are used during Unicode\nnormalization. Sequences of marks are sorted by combining class,\nreordering the sequence into increasing numerical order.\n\nIn addition, some Arabic and Syriac marks require special handling\nwhen shaping Arabic text, during the mark-reordering stage. These\nmarks fall into two classes of _Modifier Combining Marks_ (<abbr>MCM</abbr>) that\nmay need to be repositioned closer to the base character, when they\noccur in sequences of multiple marks. \n\nThe sets are:\n  - Below-base (class 220) <abbr title=\"Modifier Combining Mark\">MCM</abbr>s\n  - Above-base (class 230) <abbr title=\"Modifier Combining Mark\">MCM</abbr>s\n  \nThese classifications are used in the [mark-transient-reordering\nstage](#stage-1-transient-reordering-of-modifier-combining-marks).\n\nLists of the marks that belong to each <abbr title=\"Modifier Combining Mark\">MCM</abbr> classes are included in the\nscript-specific shaping documents for Arabic and Syriac.\n\t\t\t\n\t\t\t\n### Character tables ###\n\nCharacter tables for all of the scripts, plus important miscellaneous\ncharacters, are available here: \n\n  - [Arabic](character-tables/character-tables-arabic.md#arabic-character-table)\n  - [Syriac](character-tables/character-tables-syriac.md#syriac-character-table)\n  - [N'Ko](character-tables/character-tables-nko.md#nko-character-table)\n  - [Mongolian](character-tables/character-tables-mongolian.md#mongolian-character-table)\n\n\n## The general Arabic-based shaping model ##\n\nProcessing a run of text tagged with any of the scripts supported by\nthe general Arabic shaping model involves seven top-level stages:\n\n1. Transient reordering of modifier combining marks\n2. Compound character composition and decomposition\n3. Computing letter joining states\n4. Applying the `stch` feature\n5. Applying the language-form substitution features from <abbr>GSUB</abbr>\n6. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n7. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Transient reordering of modifier combining marks ###\n\n<!--- http://www.unicode.org/reports/tr53/tr53-1.pdf --->\n> Note: the transient reordering of modifier combining marks is\n> necessary only for scripts that can feature the <samp>\"Shadda\"</samp> mark or\n> marks that belong to _Modifier Combining Marks_ (<abbr>MCM</abbr>) classes.\n\nSequences of adjacent marks must be reordered so that they appear in\nthe appropriate visual order before the mark-to-base and mark-to-mark\npositioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nIn particular, those marks that have strong affinity to the base\ncharacter must be placed closest to the base.\n\nThis mark-reordering operation is distinct from the standard,\ncross-script mark-reordering performed during Unicode\nnormalization. The standard Unicode mark-reordering algorithm is based\non comparing the _Canonical_Combining_Class_ (<abbr>Ccc</abbr>) properties of mark\ncodepoints, whereas this script-specific reordering utilizes the\n_Modifier_Combining_Mark_ (<abbr>MCM</abbr>) subclasses specified in the\ncharacter tables.\n\nThe algorithm for reordering a sequence of marks is:\n\n  - First, move any <samp>\"Shadda\"</samp> (combining class `33`) characters to the\n    beginning of the mark sequence.\n\t\n  -\tSecond, move any subsequence of combining-class-`230` characters that begins\n       with a `230_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters. The subsequence must be moved\n       as a group.\n\n  - Finally, move any subsequence of combining-class-`220` characters that begins\n       with a `220_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters and before all class-`230`\n       characters. The subsequence must be moved as a group.\n\n> Note: Unicode describes this mark-reordering operation, the Arabic\n> Mark Transient Reordering Algorithm (<abbr>AMTRA</abbr>), in Technical Report 53,\n> which describes it in terms that are distinct from standard,\n> <abbr>Ccc</abbr>-based mark reordering.\n>\n> Specifically, <abbr title=\"Arabic Mark Transient Reordering Algorithm\">AMTRA</abbr> is designated as an operation performed during\n> text rendering only, which therefore does not impact other\n> Unicode-compliance issues such as allowable input sequences or text\n> encoding.\n>\n> However, shaping engines may choose to perform the reordering of\n> modifier combining marks in conjunction with their Unicode\n> normalization functionality for increased efficiency.\n\n### Stage 2: Compound character composition and decomposition ###\n\nThe `ccmp` feature allows a font to substitute\n\n - mark-and-base sequences with a pre-composed glyph including both\n    the mark and the base (as is done in with a ligature substitution)\n\t\n  - individual compound glyphs with the equivalent sequence of\n    decomposed glyphs (such as decomposing a letter with ijam into a\n    separate fundamental-letter glyph followed by an ijam-only glyph,\n    to permit more precise positioning)\n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n### Stage 3: Computing letter joining states ###\n\nIn order to correctly apply the initial, medial, and final form\nsubstitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr> during stage 6, the shaping engine must\ntag every letter for possible application of the appropriate feature.\n\n> Note: not all of the rules detailed below apply to every script that\n> is supported by the general Arabic shaping model.\n\nTo determine which feature is appropriate, the shaping engine must\nexamine each word in turn and compute each letter's joining state from\nthe letter's `JOINING_TYPE` and the `JOINING_TYPE` of the\npreceding character (if any).\n\n> Note: Although the supported scripts use inter-word spaces, the\n> `init` feature does _not_ refer to word-initial letters only and the\n> `fina` feature does _not_ refer to word-final letters only.\n>\n> Rather, both of these terms are defined with respect to whether or\n> not the preceding and subsequent letters form joins with the current\n> letter. The letters at word boundaries will, naturally, take on\n> initial and final forms, but initial and final forms of letters also\n> occur regularly within words, when the letter in question is\n> adjacent to a letter that does not form joins.\n\nThis computation starts from the first letter of the word, temporarily\ntagging the letter for `isol` substitution. If the first\nletter is the only letter in the word, the `isol` tag will remain unchanged.\n\nFrom here, the algorithm consumes each character in the string, one at\na time, keeping track of the JOINING_TYPE of the previous character. \n\nIf the current character is JOINING_TYPE_TRANSPARENT, move on to the next\ncharacter but preserve the currently-tracked JOINING_TYPE at its previous state.\n\nIf the preceding character's JOINING_TYPE is LEFT, DUAL, or\nJOIN_CAUSING:\n  - In `<syrc>` text, if the current character is <samp>\"Alaph\"</samp>, tag the\n    current character for `med2`, then update the tag for the\n    preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n  - If the current character's JOINING_TYPE is RIGHT, DUAL, or\n    JOIN_CAUSING, tag the current character for `fina`, then update\n    the tag for the preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n\nOtherwise, tag the current character for `isol`.\n\nAfter testing the final character of the word, if the text is in `<syrc>` and\nif the last character that is not JOINING_TYPE_TRANSPARENT or\nJOINING_TYPE_NON_JOINING is <samp>\"Alaph\"</samp>, perform an additional test:\n  - If the preceding character is JOINING_TYPE_LEFT, tag the current character\n    for `fina`\n  - If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current\n    character for `fin3`\n  - Otherwise, tag the current character for `fin2`\n\n\nOnce the last character of the word has been processed, proceed to the\nnext word and repeat the algorithm, starting at the beginning of the\nnext word.\n\n> Note: Because the processing of the characters in the algorithm\n> described above is deterministic, shaping engines may choose to\n> implement the joining-state computation as a state machine, in a lookup\n> table, or by any other means desirable.\n\nAt the end of this process, all letters should be tagged for possible\nsubstitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or\n`fin3` features.\n\n### Stage 4: Applying the `stch` feature ###\n\nThe `stch` feature decomposes and stretches special marks that are\nmeant to extend to the full width of words to which they are\nattached. It was defined for use in `<syrc>` text runs for the <samp>\"Syriac\nAbbreviation Mark\"</samp> (`U+070F`) but it can be used with similar marks in\nother scripts.\n\nTo apply the `stch` feature, the shaping engine should first decompose the\n`U+070F` glyph into components, which results in a beginning point glyph,\nmidpoint glyph, and endpoint glyph plus one (or more) extension glyphs: at\nleast one extension between the beginning and midpoint glyphs and at\nleast one extension between the midpoint and endpoint glyphs. \n\nThe shaping engine must then calculate the total length of the word to\nwhich the mark applies. That length, minus the advance widths of the\nbeginning, middle, and endpoint glyphs of the mark, must be divided by\ntwo. \n\nThe result, divided by the advance width of the extension glyph\nand rounded up to the next integer, tells the shaping engine how many\ncopies of the extension glyph must be placed between the midpoint and\neach end of the mark.\n\nFollowing this procedure ensures that the same number of extensions is\nused on each side of the mark so that it remains symmetrical.\n\nFinally, the decomposed mark must be reordered as follows: \n\n  - All of the glyphs in the sequence for the mark, _except_ for\n    the final glyph, are repositioned as a group so that they precede\n    the word to which the mark is attached.\n  - The final glyph in the mark sequence is repositioned to the end of\n    the word.\n\t\n\n### Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented with the Arabic shaping model:\n\n\tlocl\n\tisol\n\tfina\n\tfin2 (only used in Syriac)\n\tfin3 (only used in Syriac)\n\tmedi\n\tmed2 (only used in Syriac)\n\tinit\n\trlig\n\trclt\n\tcalt\n\n> Note: `rlig` and `calt` need to be appled to the word as a whole before\n> continuing to the next feature.\n\t\nSee the individual script pages for further detail on each feature and\nfor script-specific information.\n\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n### Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitution must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    liga\n\tdlig\n\tcswh\n\tmset\n\t\nSee the individual script pages for further detail on each feature and\nfor script-specific information.\n\n\n### Stage 7: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    curs\n\tkern\n\tmark\n\tmkmk\n\n\nSee the individual script pages for further detail on each feature and\nfor script-specific information.\n\n"
  },
  {
    "path": "opentype-shaping-arabic.md",
    "content": "# Arabic script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nArabic script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Joining properties](#joining-properties)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The `<arab>` shaping model](#the-arab-shaping-model)\n      - [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)\n      - [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)\n      - [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)\n      - [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)\n      - [Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-5-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 7: Applying the positioning features from <abbr>GPOS</abbr>](#stage-7-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nThe Arabic script is used to write multiple languages, most commonly\nArabic, Persian, Urdu, Pashto, Kurdish, and Azerbaijani. \n\nThe Arabic script encompasses multiple distinct styles, including Naskh, \nNataliq, and Kufi, that share a number of common features and rules,\nbut that differ considerably in their final appearance. Due to the\ncommon features found between the styles, a shaping engine can support\nall styles of Arabic with a single shaping model.\n\nIn addition, several other writing systems that observe similar rules\nand conventions can be supported using the same shaping model, even if\nthey are not historically related to Arabic. These scripts include:\n\n  - [N'Ko](opentype-shaping-nko.md)\n  - [Syriac](opentype-shaping-syriac.md)\n  - [Mongolian](opentype-shaping-mongolian.md)\n\nNote that each of these scripts has its own independent\nscript tag defined in OpenType. N'Ko uses `<nko >`, Syriac uses `<syrc>`, and\nMongolian uses `<mong>`. The information found below about the `<arab>`\nscript shaping model can serve as a general guide; script-specific\ninformation can be found in the linked document for each script. \n\nArabic is a joining script that uses inter-word spaces, so each\ncodepoint in a text run may be substituted with one of several\ncontextual forms corresponding to what, if any, characters appear\nbefore and after the codepoint. Most, but not all, letter sequences\njoin; shaping engines must track which positions trigger joining\nbehavior for each letter. \n\nArabic is written (and, therefore, rendered) from right to\nleft. Shaping engines must track the directionality of the text run\nwhen scripts of different direction are mixed.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nArabic script. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a Arabic\ncharacter that is capable of taking a diacritical mark. \n\nMost of the base characters in Arabic are consonants, but each\nlanguage written with the Arabic script may have one or more vowel\nbase letters.\n\nVowels that are not base characters are frequently omitted from the\ntext run entirely. Alternatively, such a vowel may appear as a\ndiacritical mark called a **ḥarakah**.\n\n**Ijam** is the standard term for an above- or below-base dot that\ndistinguishes one consonant from another. Ijam are not considered\ndiacritics; they are integral to the consonant of which they are a part.\n\n**Shadda** and **tashdid** are both standard terms for the \"consonant\ndoubling\" diacritical mark.\n\n**Hamza** is the standard term for the glottal stop\nsemi-consonant. The hamza is not regarded as a full letter in most\nlanguages, although it can appear as a standalone letter within\nwords. In some sequences, the hamza attaches to an adjacent letter;\nwhen a hamza-supporting letter is not adjacent, however, the hamza can\nappear on its own.\n\n**Kashida** (or **tatweel**) is the term for a glyph inserted into a\nsequence for the purpose of elongating the baseline stroke of a\nletter. Unicode documents use the term \"tatweel\" most frequently,\nwhile OpenType documents use the term \"kashida\" most\nfrequently. Kashidas are typically inserted in order to justify lines\nof text. \n\n\n## Glyph classification ##\n\nBecause Arabic is a joining (or cursive) script, proper shaping of\ntext runs involves identifying the joining behavior of each character,\nthen combining that information with any preceding or subsequent\ncharacters to determine the contextually correct form for display.\n\n### Joining properties ###\n\nArabic characters are assigned a `JOINING_TYPE` property in the\nUnicode standard that indicates how they join to adjacent\ncharacters. There are six possible values: \n\n  - `JOINING_TYPE_LEFT` indicates that a character joins with\n    the subsequent character, but does not join with the preceding\n    character. \n\t\n  - `JOINING_TYPE_RIGHT` indicates that a character joins with the\n    preceding character, but does not join with the subsequent character.\t\n\n  - `JOINING_TYPE_DUAL` indicates that a character joins with the\n    preceding character and joins with the subsequent character.\n\t\n  - `JOINING_TYPE_NON_JOINING` indicates that a character does not\n    join with the preceding or with the subsequent character.\n\t\n  - `JOINING_TYPE_TRANSPARENT` indicates that the character does not\n    join with adjacent characters _and_ that the character must be\n    skipped over when the shaping engine is evaluating the joining\n    positions in a sequence of characters. When a\n    `JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,\n    the `JOINING_TYPE` of the preceding character passes\n    through. Diacritical marks are frequently assigned this value. \n\t\n  - `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces\n    the use of joining forms with the preceding and subsequent\n    characters. Kashidas and the Zero Width Joiner (`U+200D`) are both\n    `JOIN_CAUSING` characters.\n  \n\nArabic letters are also assigned to a `JOINING_GROUP` that indicates\nwhich fundamental character they behave like with regard to joining\nbehavior. Each of the basic letters in the Arabic block tends to\nbelong to its own `JOINING_GROUP`, while letters from the supplemental and\nextended blocks are usually assigned to the `JOINING_GROUP` that\ncorresponds to the character's base letter, with no diacritics or ijam.\n\nFor example, the Persian letter \"Peh\" (`U+067E`) is visually\nrepresented as the Arabic letter \"Beh\" (`U+0628`), but with two additional\nbelow-base ijam. Consequently, \"Peh\" is assigned to the `BEH` `JOINING_GROUP`.\n\n### Mark classification ###\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nSeveral of the Arabic marks belong to standard combining\nclasses:\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|`U+064B`   | 27              | &#x064B; Fathatan / Open fathatan  |\n|`U+064C`   | 28              | &#x064C; Dammatan / Open dammatan  |\n|`U+064D`   | 29              | &#x064D; Kasratan / Open Kasratan  |\n|`U+064E`   | 30              | &#x064E; Fatha / Small fatha       |\n|`U+064F`   | 31              | &#x064F; Damma / Small damma       |\n|`U+0650`   | 32              | &#x0650; Kasra / Small kasra       |\n|`U+0651`   | 33              | &#x0651; Shadda                    |\n|`U+0652`   | 34              | &#x0652; Sukun                     |\n|`U+0670`   | 35              | &#x0670; Superscript Alef          |\n|           | 220             | Other below-base combining marks   |\n|           | 230             | Other above-base combining marks   |\n:::\n\n\nThe numeric values of these combining classes are used during Unicode\nnormalization.\n\nA subset of the Arabic marks require special handling when shaping\nArabic text, during the mark-reordering stage. These include two sets\nof _Modifier Combining Marks_ (<abbr>MCM</abbr>) that may need to be repositioned\ncloser to the base character, when they occur in sequences of multiple\nmarks. \n\nThe sets are:\n  - Below-base (class 220) <abbr title=\"Modifier Combining Mark\">MCM</abbr>s: \"Hamza below\" (`U+0655`), \"Small low seen\"\n    (`U+06E3`), \"Large round dot below\" (`U+08CF`), \"Small low waw\" (`U+08D3`)\n  - Above-base (class 230) <abbr title=\"Modifier Combining Mark\">MCM</abbr>s: \"Hamza above\" (`U+0654`), \"Mark noon ghunna\"\n    (`U+0658`), \"Small high seen\" (`U+06DC`), \"Small high yeh\" (`U+06E7`), \"Small high\n    noon\" (`U+06E8`), \"Small high Farsi yeh\" (`U+08CA`), \"Small high\n    yeh barree with two dots below\" (`U+08CB`), \"Small high zah\"\n    (`U+08CD`), \"Large round dot above\" (`U+08CE`), \"Small high waw\" (`U+08F3`)\n\nThese classifications are used in the [mark-transient-reordering\nstage](#stage-1-transient-reordering-of-modifier-combining-marks).\n\n\t\n\t\t\t\n### Character tables ###\n\nSeparate character tables are provided for the Arabic, Arabic\nSupplement, Arabic Extended-A, Abaric Extended-B, and Rumi Numeral\nSymbols blocks, as well as for other miscellaneous characters that are\nused in `<arab>` text runs:\n\n  - [Arabic character table](character-tables/character-tables-arabic.md#arabic-character-table)\n  - [Arabic Supplement character table](character-tables/character-tables-arabic.md#arabic-supplement-character-table)\n  - [Arabic Extended-A character table](character-tables/character-tables-arabic.md#arabic-extended-a-character-table)\n  - [Arabic Extended-B character table](character-tables/character-tables-arabic.md#arabic-extended-b-character-table)\n  - [Arabic Extended-C character table](character-tables/character-tables-arabic.md#arabic-extended-c-character-table)\n  - [Rumi Numeral Symbols character table](character-tables/character-tables-arabic.md#rumi-numeral-symbols-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-arabic.md#miscellaneous-character-table)\n\n<!--- Commenting out Arabic Mathematical Alphabetical Symbols block \n      since it does not involve text shaping AFAICT. --->\n<!---   - [Arabic Mathematical Alphabetic Symbols character table](character-tables/character-tables-arabic.md#arabic-mathematical-alphabetic-symbols-character-table) --->\n\nUnicode also defines two blocks that implement backward compatibility\nwith retired file-encoding formats:\n\n  - Arabic Presentation Forms-A\n  - Arabic Presentation Forms-B\n  \nUnless a software application is required to support specific stores of\ndocuments that are known to have used these older encodings, however, the\nshaping engine should not be expected to handle any text runs\nincorporating codepoints from these blocks.\n\nThe tables list each codepoint along with its Unicode general\ncategory and its joining type. For letters, the table lists the\ncodepoint's joining group. For diacritical marks, the table lists the\ncodepoint's mark combining class. The codepoint's Unicode name and an example\nglyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph                        |\n|:----------|:-----------------|:-------------|:--------------|:-----------|:-----------------------------|\n|`U+0628`   | Letter           | DUAL         | BEH           | _null_     | &#x0628; Beh                 |\n| | | | | |\n|`U+0655`   | Mark [Mn]        | TRANSPARENT  | _null_        | 220_MCM   | &#x0655; Hamza Below         |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Arabic text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text documents may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (which is not common in Arabic but can occur in\n    other scripts)\n\n\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n\nBy default, OpenType shaping reorders sequences of adjacent marks by\nsorting the sequence on the marks' Canonical_Combining_Class (<abbr>Ccc</abbr>)\nvalues. The presence of a <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr> character within a sequence of marks has\nthe effect of splitting the sequence into two sequences of marks and,\ntherefore, halting any mark-reordering that would have occurred\nbetween the marks on either side of the <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr>.\n\nThe zero-width joiner (<abbr title=\"Zero-Width Joiner\">ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence <samp>\"_Letter_,ZWJ\"</samp>\nwould be used. To show the medial form of a letter in isolation, the\nsequence <samp>\"ZWJ,_Letter_,ZWJ\"</samp> would be used.\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is primarily used to prevent a\ncursive connection between two adjacent characters that would, under\nnormal circumstances, form a join. \n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes. Generally\nspeaking <abbr title=\"Right-to-Left Mark\">RLM</abbr> and <abbr title=\"Left-to-Right Mark\">LRM</abbr> codepoints do not interact with shaping.\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and <samp>\"Hamza\"</samp>) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\n\n## The `<arab>` shaping model ##\n\nProcessing a run of `<arab>` text involves seven top-level stages:\n\n1. Transient reordering of modifier combining marks\n2. Compound character composition and decomposition\n3. Computing letter joining states\n4. Applying the `stch` feature\n5. Applying the language-form substitution features from <abbr>GSUB</abbr>\n6. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n7. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Transient reordering of modifier combining marks ###\n\n<!--- http://www.unicode.org/reports/tr53/tr53-1.pdf --->\n\nSequences of adjacent marks must be reordered so that they appear in\nthe appropriate visual order before the mark-to-base and mark-to-mark\npositioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nIn particular, those marks that have strong affinity to the base\ncharacter must be placed closest to the base.\n\nThis mark-reordering operation is distinct from the standard,\ncross-script mark-reordering performed during Unicode\nnormalization. The standard Unicode mark-reordering algorithm is based\non comparing the _Canonical_Combining_Class_ (<abbr>Ccc</abbr>) properties of mark\ncodepoints, whereas this script-specific reordering utilizes the\n_Modifier_Combining_Mark_ (<abbr>MCM</abbr>) subclasses specified in the\ncharacter tables.\n\nThe algorithm for reordering a sequence of marks is:\n\n  - First, move any <samp>\"Shadda\"</samp> (combining class `33`) characters to the\n    beginning of the mark sequence.\n\t\n  -\tSecond, move any subsequence of combining-class-`230` characters that begins\n       with a `230_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters. The subsequence must be moved\n       as a group.\n\n  - Finally, move any subsequence of combining-class-`220` characters that begins\n       with a `220_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters and before all class-`230`\n       characters. The subsequence must be moved as a group.\n\n> Note: Unicode describes this mark-reordering operation, the Arabic\n> Mark Transient Reordering Algorithm (<abbr>AMTRA</abbr>), in Unicode\n> Standard Annex 53, which describes it in terms that are distinct\n> from standard, <abbr>Ccc</abbr>-based mark reordering.\n>\n> Specifically, <abbr title=\"Arabic Mark Transient Reordering Algorithm\">AMTRA</abbr> is designated as an operation performed during\n> text rendering only, which therefore does not impact other\n> Unicode-compliance issues such as allowable input sequences or text\n> encoding.\n>\n> However, shaping engines may choose to perform the reordering of\n> modifier combining marks in conjunction with their Unicode\n> normalization functionality for increased efficiency.\n\n### Stage 2: Compound character composition and decomposition ###\n\nThe `ccmp` feature allows a font to substitute\n\n  - mark-and-base sequences with a pre-composed glyph including both\n    the mark and the base (as is done in with a ligature substitution)\n\t\n  - individual compound glyphs with the equivalent sequence of\n    decomposed glyphs (such as decomposing a letter with ijam into a\n    separate fundamental-letter glyph followed by an ijam-only glyph,\n    to permit more precise positioning)\n \n\n\nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n:::{figure-md}\n![Composition and decomposition](/images/arabic/arabic-ccmp.svg \"Composition and decomposition\"){.shaping-demo .inline-svg .greyscale-svg #arabic-ccmp}\n\nComposition and decomposition\n:::\n\n```{svg-color-toggle-button} arabic-ccmp\n```\n\n\n### Stage 3: Computing letter joining states ###\n\nIn order to correctly apply the initial, medial, and final form\nsubstitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr> during stage 6, the shaping engine must\ntag every letter for possible application of the appropriate feature.\n\n> Note: The following algorithm includes rules for processing `<syrc>`\n> text in addition to `<arab>` text. Implementers concerned only with\n> shaping `<arab>` text can omit the portions for `<syrc>`-specific\n> rules. \n\nTo determine which feature is appropriate, the shaping engine must\nexamine each word in turn and compute each letter's joining state from\nthe letter's `JOINING_TYPE` and the `JOINING_TYPE` of the\npreceding character (if any).\n\n> Note: Although Arabic uses inter-word spaces, the `init` feature\n> does _not_ refer to word-initial letters only and the `fina` feature\n> does _not_ refer to word-final letters only.\n>\n> Rather, both of these terms are defined with respect to whether or\n> not the preceding and subsequent letters form joins with the current\n> letter. The letters at word boundaries will, naturally, take on\n> initial and final forms, but initial and final forms of letters also\n> occur regularly within words, when the letter in question is\n> adjacent to a letter than does not form joins.\n\nThis computation starts from the first letter of the word, temporarily\ntagging the letter for `isol` substitution. If the first\nletter is the only letter in the word, the `isol` tag will remain unchanged.\n\nFrom here, the algorithm consumes each character in the string, one at\na time, keeping track of the JOINING_TYPE of the previous character. \n\nIf the current character is JOINING_TYPE_TRANSPARENT, move on to the next\ncharacter but preserve the currently-tracked JOINING_TYPE at its previous state.\n\nIf the preceding character's JOINING_TYPE is LEFT, DUAL, or\nJOIN_CAUSING:\n  - In `<syrc>` text, if the current character is <samp>\"Alaph\"</samp>, tag the\n    current character for `med2`, then update the tag for the\n    preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n  - If the current character's JOINING_TYPE is RIGHT, DUAL, or\n    JOIN_CAUSING, tag the current character for `fina`, then update\n    the tag for the preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n\nOtherwise, tag the current character for `isol`.\n\nAfter testing the final character of the word, if the text is in `<syrc>` and\nif the last character that is not JOINING_TYPE_TRANSPARENT or\nJOINING_TYPE_NON_JOINING is <samp>\"Alaph\"</samp>, perform an additional test:\n  - If the preceding character is JOINING_TYPE_LEFT, tag the current character\n    for `fina`\n  - If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current\n    character for `fin3`\n  - Otherwise, tag the current character for `fin2`\n\n\nOnce the last character of the word has been processed, proceed to the\nnext word and repeat the algorithm, starting at the beginning of the\nnext word.\n\n> Note: Because the processing of the characters in the algorithm\n> described above is deterministic, shaping engines may choose to\n> implement the joining-state computation as a state machine, in a lookup\n> table, or by any other means desirable.\n\nAt the end of this process, all letters should be tagged for possible\nsubstitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or\n`fin3` features.\n\n### Stage 4: Applying the `stch` feature ###\n\nThe `stch` feature decomposes and stretches special marks that are\nmeant to extend to the full width of words to which they are\nattached. It was defined for use in `<syrc>` text runs for the <samp>\"Syriac\nAbbreviation Mark\"</samp> (`U+070F`) but it can be used with similar marks in\nother scripts.\n\nTo apply the `stch` feature, the shaping engine should first decompose the\n`U+070F` glyph into components, which results in a beginning point,\nmidpoint, and endpoint glyphs plus one (or more) extension glyphs: at\nleast one extension between the beginning and midpoint glyphs and at\nleast one extension between the midpoint and endpoint glyphs. \n\nThe shaping engine must then calculate the total length of the word to\nwhich the mark applies. That length, minus the advance widths of the\nbeginning, middle, and endpoint glyphs of the mark, must be divided by\ntwo. \n\nThe result, divided by the advance width of the extension glyph\nand rounded up to the next integer, tells the shaping engine how many\ncopies of the extension glyph must be placed between the midpoint and\neach end of the mark.\n\nFollowing this procedure ensures that the same number of extensions is\nused on each side of the mark so that it remains symmetrical.\n\nFinally, the decomposed mark must be reordered as follows: \n\n  - All of the glyphs in the sequence for the mark, _except_ for\n    the final glyph, are repositioned as a group so that they precede\n    the word to which the mark is attached.\n  - The final glyph in the mark sequence is repositioned to the end of\n    the word.\n\t\n\n### Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n\tlocl\n\tisol\n\tfina\n\tfin2 (not used in <arab>)\n\tfin3 (not used in <arab>)\n\tmedi\n\tmed2 (not used in <arab>)\n\tinit\n\trlig\n\trclt\n\tcalt\n\t\n> Note: `rlig` and `calt` need to be appled to the word as a whole before\n> continuing to the next feature.\n\n#### Stage 5, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n:::{figure-md}\n![Localized form substitution](/images/arabic/arabic-locl.svg \"Localized form substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-locl}\n\nLocalized form substitution\n:::\n\n```{svg-color-toggle-button} arabic-locl\n```\n\n\n#### Stage 5, step 2: isol ####\n\nThe `isol` feature substitutes the default glyph for a codepoint with\nthe isolated form of the letter.\n\n> Note: It is common for a font to use the isolated form of a letter\n> as the default, in which case the `isol` feature would apply no\n> substitutions. However, this is only a convention, and the active\n> font may use other forms as the default glyphs for any or all\n> codepoints.\n\n:::{figure-md}\n![Isolated form substitution](/images/arabic/arabic-isol.svg \"Isolated form substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-isol}\n\nIsolated form substitution\n:::\n\n```{svg-color-toggle-button} arabic-isol\n```\n\n\n\n#### Stage 5, step 3: fina ####\n\nThe `fina` feature substitutes the default glyph for a codepoint with\nthe terminal (or final) form of the letter.\n\n:::{figure-md}\n![Final form substitution](/images/arabic/arabic-fina.svg \"Final form substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-fina}\n\nFinal form substitution\n:::\n\n```{svg-color-toggle-button} arabic-fina\n```\n\n\n\n#### Stage 5, step 4: fin2 ####\n\nThis feature is not used in `<arab>` text.\n\n#### Stage 5, step 5: fin3 ####\n\nThis feature is not used in `<arab>` text.\n\n#### Stage 5, step 6: medi ####\n\nThe `medi` feature substitutes the default glyph for a codepoint with\nthe medial form of the letter.\n\n:::{figure-md}\n![Medial form substitution](/images/arabic/arabic-medi.svg \"Medial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-medi}\n\nMedial form substitution\n:::\n\n```{svg-color-toggle-button} arabic-medi\n```\n\n\n\n#### Stage 5, step 7: med2 ####\n\nThis feature is not used in `<arab>` text.\n\n#### Stage 5, step 8: init ####\n\nThe `init` feature substitutes the default glyph for a codepoint with\nthe initial form of the letter.\n\n:::{figure-md}\n![Initial form substitution](/images/arabic/arabic-init.svg \"Initial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-init}\n\nInitial form substitution\n:::\n\n```{svg-color-toggle-button} arabic-init\n```\n\n\n\n#### Stage 5, step 9: rlig ####\n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Required ligature substitution](/images/arabic/arabic-rlig.svg \"Required ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-rlig}\n\nRequired ligature substitution\n:::\n\n```{svg-color-toggle-button} arabic-rlig\n```\n\n\n\n#### Stage 5, step 10: rclt ####\n\nThe `rclt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe `rclt` feature should be used to perform such substitutions that\nare required by the orthography of the active script and\nlanguage. Substitutions made by `rclt` cannot be disabled by \napplication-level user interfaces.\n\n#### Stage 5, step 11: calt ####\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe `calt` feature, in contrast to `rclt` above, performs\nsubstitutions that are not mandatory for orthographic\ncorrectness. However, unlike `rclt`, the substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\n:::{figure-md}\n![Contextual alternate substitution](/images/arabic/arabic-calt.svg \"Contextual alternate substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-calt}\n\nContextual alternate substitution\n:::\n\n```{svg-color-toggle-button} arabic-calt\n```\n\n\n\n\n### Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    liga\n\tdlig\n\tcswh\n\tmset\n\t\n\n#### Stage 6, step 1: liga ####\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Standard ligature substitution](/images/arabic/arabic-liga.svg \"Standard ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-liga}\n\nStandard ligature substitution\n:::\n\n```{svg-color-toggle-button} arabic-liga\n```\n\n\n#### Stage 6, step 2: dlig ####\n\nThe `dlig` feature substitutes additional optional ligatures that are\noff by default. Substitutions made by `dlig` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Discretionary ligature substitution](/images/arabic/arabic-dlig.svg \"Discretionary ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #arabic-dlig}\n\nDiscretionary ligature substitution\n:::\n\n```{svg-color-toggle-button} arabic-dlig\n```\n\n\n#### Stage 6, step 3: cswh ####\n\nThe `cswh` feature substitutes contextual swash variants of\nglyphs. For example, the active font might substitute a longer variant\nof <samp>\"Noon\"</samp> when a certain number of subsequent glyphs do not descend\nbelow the baseline.\n\n\n#### Stage 6, step 4: mset ####\n\nThe `mset` feature performs mark positioning by substituting sequences\nof bases and marks with precomposed base-and-mark glyphs.\n\n> Note: Positioning marks with the `mark` and `mkmk` features of <abbr title=\"Glyph Positioning table\">GPOS</abbr> is\n> preferred, because `mset` can interfere with the OpenType shaping\n> process. For example, substitution rules contained in `mset` may not be able to\n> account for necessary mark-reordering adjustments conducted in the\n> next stage.\n> \n> Nevertheless, when the active font uses `mset` substitutions, the\n> shaping engine must deal with the situation gracefully.\n\n### Stage 7: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    curs\n\tdist\n\tkern\n\tmark\n\tmkmk\n\n#### Stage 7, step 1: curs ####\n\nThe `curs` feature perform cursive positioning. Each glyph has an\nentry point and exit point; the `curs` feature positions glyphs so\nthat the entry point of the current glyph meets the exit point of the\npreceding glyph.\n\n:::{figure-md}\n![Cursive positioning](/images/arabic/arabic-curs.svg \"Cursive positioning\"){.shaping-demo .inline-svg .greyscale-svg #arabic-curs}\n\nCursive positioning\n:::\n\n```{svg-color-toggle-button} arabic-curs\n```\n\n\n#### Stage 7, step 2: dist ####\n\nThe `dist` feature adjusts glyph spacing between glyphs. Unlike `kern`,\nadjustments made with `dist` do not require the application or the user\nto enable any software kerning features, if such features are\noptional. \n\n:::{figure-md}\n![Distance adjustment](/images/arabic/arabic-dist.svg \"Distance adjustment\"){.shaping-demo .inline-svg .greyscale-svg #arabic-dist}\n\nDistance adjustment\n:::\n\n```{svg-color-toggle-button} arabic-dist\n```\n\n\n#### Stage 7, step 3: kern ####\n\nThe `kern` feature adjusts glyph spacing between pairs of adjacent glyphs.\n\n:::{figure-md}\n![Kerning adjustment](/images/arabic/arabic-kern.svg \"Kerning adjustment\"){.shaping-demo .inline-svg .greyscale-svg #arabic-kern}\n\nKerning adjustment\n:::\n\n```{svg-color-toggle-button} arabic-kern\n```\n\n\n\n#### Stage 7, step 4: mark ####\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Mark positioning](/images/arabic/arabic-mark.svg \"Mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #arabic-mark}\n\nMark positioning\n:::\n\n```{svg-color-toggle-button} arabic-mark\n```\n\n\n#### Stage 7, step 5: mkmk ####\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n:::{figure-md}\n![Mark-to-mark positioning](/images/arabic/arabic-mkmk.svg \"Mark-to-mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #arabic-mkmk}\n\nMark-to-mark positioning\n:::\n\n```{svg-color-toggle-button} arabic-mkmk\n```\n\n\n"
  },
  {
    "path": "opentype-shaping-bengali.md",
    "content": "```{include} /_global.md\n```\n\n# Bengali shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Bengali script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Bengali character tables](#bengali-character-tables)\n  - [The `<bng2>` shaping model](#the-bng2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<beng>` shaping model](#the-beng-shaping-model)\n      - [Distinctions from `<bng2>`](#distinctions-from-bng2)\n      - [Advice for handling fonts with `<beng>` features only](#advice-for-handling-fonts-with-beng-features-only)\n      - [Advice for handling text runs composed in `<beng>` format](#advice-for-handling-text-runs-composed-in-beng-format)\n\n\n## General information ##\n\nThe Bengali or Bangla script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the North Indic subgroup, in which\nsequences of adjacent consonants are often represented as conjuncts.\n\nThe Bengali script is used to write multiple languages, most commonly\nBengali, Assamese, and Manipuri. In addition, Sanskrit may be written\nin Bengali, so Bengali script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Bengali script tags defined in OpenType, `<beng>`\nand `<bng2>`. The older script tag, `<beng>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<bng2>`\nshaping model. However, if a font is encountered that supports only\n`<beng>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. In the Bengali\nlanguage, dependent-vowel signs  may also be referred to as _kar_ forms — e.g., \"i-kar\" or\n\"u-kar\".\n\nThe term \"matra\" is also used to refer to the headline above most\nBengali letters. To avoid ambiguity, the term **headline** is\nused in most Unicode and OpenType shaping documents.\n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nmark. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Bengali\nlanguage, this sign is known as the _hasanta_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Bengali\nlanguage, this mark is known as the _candrabindu_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Bengali text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nBengali glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nBengali uses one subclass of consonant, `CONSONANT_DEAD`. This\nsubclass is used only for the Bengali \"Khanda Ta\" (`U+09CE`). It indicates that\n<samp>\"Khanda Ta\"</samp> should match tests for consonants, such as when [identifying\nsyllables](#stage-1-identifying-syllables-and-other-sequences), but that, unlike\nstandard consonants, it carries no inherent vowel. The lack of an\ninherent vowel is important during the [initial\nreordering](#stage-2-initial-reordering) stage.\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the syllable base to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the base consonant or syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Bengali character tables ###\n\nSeparate character tables are provided for the Bengali and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<bng2>` text runs:\n\n  - [Bengali character table](character-tables/character-tables-bengali.md#bengali-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-bengali.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-bengali.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0981`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0981; Candrabindu         |\n| | | | |\n|`U+0995`   | Letter           | CONSONANT         | _null_                     | &#x0995; Ka                  |\n:::\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Bengali text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr title=\"Zero-Width Joiner\">ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. \n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>.\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. \n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of Bengali text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<bng2>` shaping model ##\n\nProcessing a run of `<bng2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Bengali's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_AFTER_SUBJOINED` = <samp>\"Reph\"</samp> is ordered after all subjoined (i.e.,\n     below-base) consonant forms.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = _null_  = Unlike most other Indic scripts, Bengali\n     does not use any above-base matras. Therefore, this shaping\n     characteristic does not apply.\n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are\n     ordered after all subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n> Note: Unlike most other Indic scripts, Bengali does not use\n> above-base matras. Therefore `MATRA_POS_TOP` can be set to _null_.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Bengali consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Bengali Unicode block enumerates five modifier signs,\n> \"Candrabindu\" (`U+0981`), \"Anusvara\" (`U+0982`), \"Visarga\" \n> (`U+0983`), \"Avagraha\" (`U+09BD`), and \"Vedic Anusvara\"\n> (`U+09FC`). In addition, Sanskrit text written in Bengali may\n> include additional signs from Vedic Extensions block.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that vowel is the\nsyllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nNon-base consonants in a valid syllable will be separated by <samp>\"Halant\"</samp>\nmarks. Pre-base consonants will be followed by <samp>\"Halant\"</samp>, while\npost-base consonants will be preceded by <samp>\"Halant\"</samp>.\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\t\nThe algorithm for correctly identifying the base consonant includes a\ntest to recognize these sequences and not mis-identify the base\nconsonant.\n\nAll consonants in Bengali can potentially occur in pre-base\nposition. The <samp>\"Halant\"</samp> marks on pre-base consonants indicate that they\ncarry no vowel. Instead, they affect syllable pronunciation by\ncombining with the base consonant (e.g., \"_thr_\" or \"_spl_\").\n\nThree consonants in Bengali are allowed to occur after the base\nconsonant or syllable base: \"Ya\", \"Ba\", and \"Ra\". When these consonants occur after the\nbase consonant or syllable base, they take on special forms.\n\nA <samp>\"Ya\"</samp> after the base consonant or syllable base takes on the <samp>\"Yaphala\"</samp> form.\n\n> Note: some fonts may also implement the <samp>\"Yaphala\"</samp> form for a\n> post-base \"Yya\" (`U+09DF`).\n\nA <samp>\"Ba\"</samp> after the base consonant or syllable bases takes on the below-base <samp>\"Baphala\"</samp>\nform. A <samp>\"Ba\"</samp> before the base consonant or syllable base will take on the below-base\n<samp>\"Baphala\"</samp> form unless it is the first pre-base consonant in the syllable.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by one of two combining\nmark-like forms. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable is replaced\n    with an above-base mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp> is the only\n    consonant in the syllable). This rule is synonymous with the\n    `REPH_MODE_IMPLICIT` characteristic mentioned earlier.\n\n  - A non-initial <samp>\"Ra\"</samp> before the base consonant or syllable base or a <samp>\"Ra\"</samp> after the\n    base consonant or syllable base takes on the below-base form <samp>\"Raphala.\"</samp>\n  \n<samp>\"Reph\"</samp> characters must be reordered after the\nsyllable-identification stage is complete. \n\n> Note: `<bng2>` text contains two Unicode codepoints for \"Ra.\"\n> `U+09B0` and `U+09F0`. \n>\n> `U+09B0` is used in Bengali-language, Manipuri-language, and\n> Sanskrit text. `U+09F0` is used in Assamese-language text.\n>\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Bengali script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Bengali syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify \"Ra\" characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. <samp>\"Ba\"</samp>, <samp>\"Ta\"</samp>, and <samp>\"Ya\"</samp> occasionally\n> take on special forms, depending on their position in the syllable.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nBengali. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to step 2.1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n<!--- >\n> Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Bengali does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nBengali includes one post-base consonant. \n\n  - The sequence <samp>\"Halant,Ya\"</samp> (`U+09CD`,`U+09AF`)  triggers\n    the <samp>\"Yaphala\"</samp> form. <samp>\"Yaphala\"</samp> behaves like a modifier to the\n    pronunciation of the preceding vowel, despite the fact that it is\n    formed from a consonant.\n\n:::{figure-md}\n![Yaphala composition](/images/bengali/bengali-yaphala.svg \"Yaphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-yaphala}\n\nYaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-yaphala\n```\n\n\n> Note: some fonts may also implement the <samp>\"Yaphala\"</samp> post-base form for\n> <samp>\"Halant,Yya\"</samp> (`U+09CD`,`U+09DF`).\n\nBengali includes two below-base consonant forms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on the <samp>\"Raphala\"</samp> form.\n  - <samp>\"Ba,Halant\"</samp> (before the syllable base) and <samp>\"Halant,Ba\"</samp> (after the\n    syllable base) take on the <samp>\"Baphala\"</samp> form.\n\t\n\n:::{figure-md}\n![Raphala composition](/images/bengali/bengali-raphala.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-raphala}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-raphala\n```\n\n\n:::{figure-md}\n![Baphala composition](/images/bengali/bengali-baphala.svg \"Baphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-baphala}\n\nBaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-baphala\n```\n\n\n> Note: Because Bengali employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base.\n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Step 2.5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the base consonant or syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any two-part dependent vowels (matras) must be decomposed\ninto their left-side and right-side components. Bengali has two\ntwo-part dependent vowels, \"O\" (`U+09CB`) and \"Au\" (`U+09CC`). Each\nhas a canonical decomposition, so this step is unambiguous. \n\n> \"O\" (`U+09CB`) decomposes to \"`U+09C7`,`U+09BE`\"\n>\n> \"Au\" (`U+09CC`) decomposes to \"`U+09C7`,`U+09D7`\"\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/bengali/bengali-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} bengali-matra-decompose\n```\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll right-side dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nAll below-base dependent-vowel (matra) signs are tagged\n`POS_AFTER_SUBJOINED`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign_\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n\n> Note: Bengali includes the consonant \"Yya\" (`U+09DF`), which is\n> canonically equivalent to the sequence <samp>\"Ya,Nukta\"</samp>\n> (`U+09AF`,`U+09BC`). \"Ya\" can also take on the post-base <samp>\"Yaphala\"</samp>\n> form when it occurs in the sequence <samp>\"`SYLLABLE_BASE`,Halant,Ya\"</samp>.\n>\n> Consequently, shaping engines that encounter a <samp>\"Ya,Nukta\"</samp>\n> sequence may wish to recompose that sequence to <samp>\"Yya\"</samp> earlier than\n> other nukta-variant substitutions, as a safeguard\n> against the decomposed <samp>\"Ya\"</samp> unintentionally triggering a <samp>\"Yaphala\"</samp>\n> substitution during <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature application (if the sequence in\n> question happens to match the <samp>\"Yaphala\"</samp> substitution rule as well as\n> the <samp>\"Yya\"</samp> substitution rule).\n> \n> A well-behaved font should be expected to include explicit <samp>\"Yya\"</samp> and\n> <samp>\"Yaphala\"</samp> substitution rules that do not trigger unexpected results,\n> but there is no guarantee that real-world fonts will be well-behaved\n> in this regard.\n\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be\ntagged. Excluding initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s:\n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nBengali includes two below-base consonant forms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on the <samp>\"Raphala\"</samp> form.\n  - <samp>\"Ba,Halant\"</samp> (before the syllable base) and <samp>\"Halant,Ba\"</samp> (after the\n    syllable base) take on the <samp>\"Baphala\"</samp> form.\n\t\n\n:::{figure-md}\n![Raphala composition](/images/bengali/bengali-raphala-1.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-raphala-1}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-raphala-1\n```\n\n:::{figure-md}\n![Baphala composition](/images/bengali/bengali-baphala-1.svg \"Baphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-baphala-1}\n\nBaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-baphala-1\n```\n\n\n> Note: Because Bengali employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in 2.1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<bng2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n\n<!---  Not sure about Yya.... --->\n\t\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,Ra\"</samp> and <samp>\"Halant,Ba\"</samp> in\n            post-base positions and <samp>\"Ra,Halant\"</samp> and\n            <samp>\"Ba,Halant\"</samp> in non-initial pre-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,Ya\"</samp> in post-base position\n  - `vatu` should match <samp>\"_Consonant_,Halant,Ra\"</samp>\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Bengali)\n\tpref (not used in Bengali)\n\tblwf \n\tabvf (not used in Bengali)\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar (not used in Bengali)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\nThe context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n\n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n\n:::\n\n:::{figure-md}\n![Nukta composition](/images/bengali/bengali-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} bengali-nukt\n```\n\n> Note: Bengali includes the consonant \"Yya\" (`U+09DF`), which is\n> canonically equivalent to the sequence <samp>\"Ya,Nukta\"</samp>\n> (`U+09AF`,`U+09BC`). <samp>\"Ya\"</samp> can also take on the post-base <samp>\"Yaphala\"</samp>\n> form when it occurs in the sequence <samp>\"`SYLLABLE_BASE`,Halant,Ya\"</samp>.\n>\n> Consequently, shaping engines that encounter a <samp>\"Ya,Nukta\"</samp>\n> sequence may wish to recompose that sequence to <samp>\"Yya\"</samp> earlier than\n> other nukta-variant substitutions, as a safeguard\n> against the decomposed <samp>\"Ya\"</samp> unintentionally triggering a <samp>\"Yaphala\"</samp>\n> substitution during <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature application (if the sequence in\n> question happens to match the <samp>\"Yaphala\"</samp> substitution rule as well as\n> the <samp>\"Yya\"</samp> substitution rule).\n> \n> A well-behaved font should be expected to include explicit <samp>\"Yya\"</samp> and\n> <samp>\"Yaphala\"</samp> substitution rules that do not trigger unexpected results,\n> but there is no guarantee that real-world fonts will be well-behaved\n> in this regard.\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\nThe context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n\n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n:::{figure-md}\n![KSsa ligation](/images/bengali/bengali-akhn-kssa.svg \"KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-akhn-kssa}\n\nKSsa ligation\n:::\n\n```{svg-color-toggle-button} bengali-akhn-kssa\n```\n\n:::{figure-md}\n![JNya ligation](/images/bengali/bengali-akhn-jnya.svg \"JNya ligation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-akhn-jnya}\n\nJNya ligation\n:::\n\n```{svg-color-toggle-button} bengali-akhn-jnya\n```\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\n\nThe context defined for a `rphf` feature is:\n    \n:::{table} `rphf` feature context\n\n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n\n:::\n\n\n:::{figure-md}\n![Reph composition](/images/bengali/bengali-rphf.svg \"Reph composition with common Ra\"){.shaping-demo .inline-svg .greyscale-svg #bengali-rphf}\n\nReph composition with common \"Ra\"\n:::\n\n```{svg-color-toggle-button} bengali-rphf\n```\n\n:::{figure-md}\n![Reph composition](/images/bengali/bengali-rphf-as.svg \"Reph composition with Assamese Ra\"){.shaping-demo .inline-svg .greyscale-svg #bengali-rphf-as}\n\nReph composition with Assamese \"Ra\"\n:::\n\n```{svg-color-toggle-button} bengali-rphf-as\n```\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Bengali.\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Bengali.\n\n<!--- 3.5: The `pref` feature replaces pre-base-consonant glyphs with -->\n<!--any special forms. --->\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Bengali includes two below-base consonant\nforms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on the <samp>\"Raphala\"</samp> form.\n  - <samp>\"Ba,Halant\"</samp> (before the syllable base) and <samp>\"Halant,Ba\"</samp> (after the\n    syllable base) take on the <samp>\"Baphala\"</samp> form. \n\nBecause Bengali incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n\n:::{figure-md}\n![Raphala composition](/images/bengali/bengali-raphala-2.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-raphala-2}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-raphala-2\n```\n\n:::{figure-md}\n![Baphala composition](/images/bengali/bengali-baphala-2.svg \"Baphala composition\"){.shaping-demo .inline-svg .greyscale-svg #bengali-baphala-2}\n\nBaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-baphala-2\n```\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Bengali.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n_Consonant_.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are three exceptions to the default behavior, for which\nthe shaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - Non-initial <samp>\"Ra,Halant\"</samp> and <samp>\"Ba,Halant\"</samp> sequences, which should\n    have been flagged for the `rkrf` or `blwf` features earlier, must\n    not be flagged for potential `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n:::{figure-md}\n![Half-form formation](/images/bengali/bengali-half-ka.svg \"Half-form formation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-half-ka}\n\nHalf-form formation\n:::\n\n```{svg-color-toggle-button} bengali-half-ka\n```\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any special forms.\n\n\n:::{figure-md}\n![Yaphala composition](/images/bengali/bengali-yaphala-1.svg \"Yaphala formation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-yaphala-1}\n\nYaphala composition\n:::\n\n```{svg-color-toggle-button} bengali-yaphala-1\n```\n\n#### Stage 3, step 11: vatu ####\n\nThe `vatu` feature replaces certain sequences with \"Vattu variant\"\nforms. \n\n\"Vattu variants\" are formed from glyphs followed by <samp>\"Raphala\"</samp>\n(the below-base form of \"Ra\"); therefore, this feature must be applied after\nthe `blwf` feature.\n\nThe context defined for a `vatu` feature is:\n    \n:::{table} `vatu` feature context\n\n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| _none_           | `_consonant_`,\"Raphala\" | _none_        |\n\n:::\n\n\n:::{figure-md}\n![Vattu variant ligation](/images/bengali/bengali-vatu.svg \"Vattu variant ligation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-vatu}\n\nVattu variant ligation\n:::\n\n```{svg-color-toggle-button} bengali-vatu\n```\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<bng2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n\n:::{figure-md}\n![Conjunct ligation](/images/bengali/bengali-cjct.svg \"Conjunct ligation\"){.shaping-demo .inline-svg .greyscale-svg #bengali-cjct}\n\nConjunct ligation\n:::\n\n```{svg-color-toggle-button} bengali-cjct\n```\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Bengali.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra reordering](/images/bengali/bengali-matra-position.svg \"Pre-base matra reordering\"){.shaping-demo .inline-svg .greyscale-svg #bengali-matra-position}\n\nPre-base matra reordering\n:::\n\n```{svg-color-toggle-button} bengali-matra-position\n```\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Bengali incorporates the `REPH_POS_AFTER_SUBJOINED`\nshaping characteristic, this final position is defined to be\nimmediately after the syllable base and any subjoined (below-base\nconsonant or below-base dependent vowel) forms.\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n<!---\n  - If the syllable does not have a base consonant (such as a syllable\n    based on an independent vowel or placeholder), then the final\n    <samp>\"Reph\"</samp> position is immediately before the first character tagged\n    with the `POS_BEFORE_POST` position or any later position in the\n    sort order.\n\n    -- If there are no characters tagged with `POS_BEFORE_POST` or\n       later positions, then <samp>\"Reph\"</samp> is positioned at the end of the\n       syllable.\n--->\n\n  - Starting at the first post-<samp>\"Reph\"</samp> consonant, search forward looking\n    for the first explicit <samp>\"Halant\"</samp>, ending the search when the base\n    consonant is encountered. If such an explicit <samp>\"Halant\"</samp> is found,\n    move the <samp>\"Reph\"</samp> to the position immediately after this\n    <samp>\"Halant\"</samp>.\n\t  * If a zero-width joiner (<abbr>ZWJ</abbr>) or a zero-width non-joiner (<abbr>ZWNJ</abbr>)\n        follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n        immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>. This will be the final\n        <samp>\"Reph\"</samp> position. \n\t  * If no <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> follows this <samp>\"Halant\"</samp>, leave the <samp>\"Reph\"</samp> in\n        its position immediately after the <samp>\"Halant\"</samp>. This will be the\n        final <samp>\"Reph\"</samp> position. \n  - If no such explicit <samp>\"Halant\"</samp> is found in the previous step, find\n    the first post-base consonant that has not formed a ligature with\n    the base consonant. If such a non-ligated post-base consonant is\n    found, move the <samp>\"Reph\"</samp> to the position immediately before the\n    non-ligated post-base consonant. This will be the final <samp>\"Reph\"</samp>\n    position.\n  - If no such non-ligated post-base consonant is found in the\n    previous step, move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Bengali incorporates the\n    > `REPH_POS_AFTER_SUBJOINED` shaping characteristic, this means\n    > any positioning tag of `POS_BEFORE_POST` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_BEFORE_POST`.\n  - If no other location has been located in the previous steps, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n:::{figure-md}\n![Reph final reordering](/images/bengali/bengali-reph-position.svg){.shaping-demo .inline-svg .greyscale-svg #bengali-reph-position}\n\nReph final reordering\n:::\n\n```{svg-color-toggle-button} bengali-reph-position\n```\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nBengali does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<bng2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n> Note: Although the specification defines the `init` feature as being\n> used for word-initial positions only, the feature's origin bases\n> this on the linguistic sense of \"word\" and that sense may not be\n> precise enough to cover all of the cases encountered in a\n> contemporary text run.\n>\n> In practice, users may expect the `init` feature to be applied when\n> a sequence has a left-side dependent vowel that is preceded by a\n> punctuation character, a currency symbol, an emoji, or any of\n> several other categories of code point. Shaping engines may need to\n> adapt their matching rules to meet users' expectations for this\n> feature. \n>\n> The Microsoft Uniscribe shaping engine historically tested for a\n> certain range of  Unicode `General Category` and more recent shaping\n> engines follow suit. For more information on Uniscribe\n> compatibility, see the [Uniscribe-bug-compatibility\n> note](/notes/uniscribe-bug-compatibility.md). \n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature replaces word-initial glyphs with special\npresentation forms. Generally, these forms involve removing the\nheadline in-stroke from the left side of the glyph.\n\nThe context defined for an `init` feature is:\n    \n:::{table} `init` feature context\n\n| Backtrack    | Matching sequence          | Lookahead           |\n|:-------------|:---------------------------|:--------------------|\n| `WORD_START` | `_matra_`(`LEFT_POSITION`) | `_consonant_`(full) |\n\n:::\n\n:::{figure-md}\n![Application of the init feature](/images/bengali/bengali-init.svg \"Application of the init feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-init}\n\nApplication of the `init` feature\n:::\n\n```{svg-color-toggle-button} bengali-init\n```\n\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![Application of the pres feature](/images/bengali/bengali-pres.svg \"Application of the pres feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-pres}\n\nApplication of the `pres` feature\n:::\n\n```{svg-color-toggle-button} bengali-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Application of the abvs feature](/images/bengali/bengali-abvs.svg \"Application of the abvs feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-abvs}\n\nApplication of the `abvs` feature\n:::\n\n```{svg-color-toggle-button} bengali-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants that\nare adjacent to below-base-consonant forms like <samp>\"Raphala\"</samp> or\n<samp>\"Baphala\"</samp> with contextual ligatures.\n\n:::{figure-md}\n![Application of the blws feature](/images/bengali/bengali-blws.svg \"Application of the blws feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-blws}\n\nApplication of the `blws` feature\n:::\n\n```{svg-color-toggle-button} bengali-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures.\n\n:::{figure-md}\n![Application of the psts feature](/images/bengali/bengali-psts.svg \"Application of the psts feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-psts}\n\nApplication of the `psts` feature\n:::\n\n```{svg-color-toggle-button} bengali-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic.\n\n:::{figure-md}\n![Application of the haln feature](/images/bengali/bengali-haln.svg \"Application of the haln feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-haln}\n\nApplication of the `haln` feature\n:::\n\n```{svg-color-toggle-button} bengali-haln\n```\n\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Bengali shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Bengali text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Bengali, this includes <samp>\"Reph\"</samp> in addition to the\ndiacritical marks and Vedic signs. \n\n:::{figure-md}\n![Application of the abvm feature](/images/bengali/bengali-abvm.svg \"Application of the abvm feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-abvm}\n\nApplication of the `abvm` feature\n:::\n\n```{svg-color-toggle-button} bengali-abvm\n```\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Bengali, this includes below-base dependent vowels\n(matras) as well as the below-base consonant forms <samp>\"Raphala\"</samp> and\n<samp>\"Baphala\"</samp>.\n\n:::{figure-md}\n![Application of the blwm feature](/images/bengali/bengali-blwm.svg \"Application of the blwm feature\"){.shaping-demo .inline-svg .greyscale-svg #bengali-blwm}\n\nApplication of the `blwm` feature\n:::\n\n```{svg-color-toggle-button} bengali-blwm\n```\n\n\n## The `<beng>` shaping model ##\n\nThe older Bengali script tag, `<beng>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<beng>` and some users may still have documents that were written to\ntake advantage of `<beng>` shaping.\n\n### Distinctions from `<bng2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<beng>` to\n`<bng2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<beng>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<bng2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<beng>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<beng>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\n\n### Advice for handling fonts with `<beng>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<beng>` shaping model.\n\n### Advice for handling text runs composed in `<beng>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<beng>` script tag and it is known that the font in use supports\nonly the `<bng2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<beng>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-default.md",
    "content": "# Default script shaping in OpenType #\n\nThis document details the default shaping procedure needed to display\ntext runs in non-complex scripts. It may also be used as a fallback\nmodel for unrecognized scripts.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Normalization](#normalization)\n  - [The default shaping model](#the-default-shaping-model)\n      - [Stage 1: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-1-applying-the-basic-substitution-features-from-gsub)\n\t  - [Stage 2: Applying typographic substitution features from <abbr>GSUB</abbr>](#stage-2-applying-typographic-substitution-features-from-gsub)\n\t  - [Stage 3: Applying the positioning features from <abbr>GPOS</abbr>](#stage-3-applying-the-positioning-features-from-gpos)\n  \n  \n  \n## General information ##\n\nThe default OpenType shaping model is used for scripts that are\nconsidered _non-complex_ from the shaper's perspective. This\ndesignation means that shaping a text run does not involve glyph\nreordering, contextual joining behavior, or the substitution of\ncontext-dependent forms for linguistic or orthographic correctness.\n\nText runs in non-complex scripts may, however, involve ligature\nsubstitution, Unicode normalization, mark positioning, kerning, and\nthe application of other features from the active font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntables.\n\nThe non-complex scripts covered by this model include Latin, Cyrillic,\nGreek, Armenian, Georgian, Ethiopic, Cherokee, Tifinagh, and many others.\n\n\n## Terminology ##\n\nMany of these scripts support diacritics and other **marks**. Unicode may\ncontain **precomposed** mark-and-base codepoints for some or all\ncombinations of marks and base letters in the script. For combinations\nwithout a codepoint, the desired form can be achieved by following the\n**base** letter with a **combining mark** codepoint. \n\nThe primary concern for the shaping engine is processing the text run into\nthe correct normalized form, so that the best glyphs from the active\nfont can be selected from among the available precomposed and\ncombining alternatives.\n\nFonts for non-complex scripts might not include a <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> table\nat all. \n\nHowever, <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> may also be used to implement a variety of\nOpenType smart features, including several classes of ligature,\ncontextual alternate, or contextual positioning rules. Because these\nfeatures are not required in order to render the text run\northographically correct, the features are not considered shaping\nfeatures. Nevertheless, the shaping engine may be expected to apply\nthese features in order to simplify the overall text-rendering\narchitecture of the implementation.\n\n## Normalization ##\n\nUnicode defines algorithms for normalizing a sequence of input\ncodepoints into either a canonical composed form or a canonical\ndecomposed form. The purpose of these algorithms and of the defined\nnormalization forms is to determine equivalent representations of input\nsequences regardless of variations in the input sequences.\n\nFor example, a base letter with an attached mark might exist in\nUnicode as a single codepoint, but an input sequence might consist of\nthe base letter codepoint followed by the combining mark\ncodepoint. Unicode normalization can be used to determine that the\n<samp>\"letter, mark\"</samp> sequence is equivalent to the single codepoint. This\nsimplifies sorting, searching, string comparison, and many other common\ntasks.\n\nOpenType shaping utilizes Unicode normalization, but OpenType\nshaping has a distinctly different goal: to select the best or most\nappropriate representation of the input codepoint sequence that is\navailable in the active font. A full description of the algorithm is\navailable in the [normalization](opentype-shaping-normalization.md) document. \n\nShaping some complex scripts involves explicit composition or\ndecomposition steps. The default shaping model does not involve any\nsuch steps, but it does proceed with the general assumption that text\nruns have been normalized as part of input sanitization. \n\nFor convenience, shaping engines may choose to implement a single\nnormalization routine for all scripts, default and complex. If\nnormalization is done before the shaping-model–specific processing is\ndone, then there may be no work required in certain shaping steps\n(such as the processing of `ccmp` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>). However,\nthese steps will always be described in the relevant script's shaping\ndocument. \n\n\n## The default shaping model ##\n\nProcessing a run of text in the default shaping model involves three\ntop-level stages:\n\n1. Applying the basic substitution features from <abbr>GSUB</abbr>\n2. Applying typographic substitution features from <abbr>GSUB</abbr>\n3. Applying the positioning features from <abbr>GPOS</abbr>\n\nTogether, these stages cover the application of all <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\nfeatures that are required or that have been defined by OpenType as\nbeing on by default.\n\nFor convenience, shaping engines may also choose to apply any optional\nor off-by-default OpenType features that have been activated for the\ntext run (including those that have been\nenabled by the user and those that have been enabled at the\napplication level). However, the order in which such features should\nbe applied and how they should interact with OpenType shaping features\nis beyond the scope of this document.\n\nThe default shaping model does not involve syllable-identification,\nword-identification, or other preprocessing of the input\nsequence. Shaping engines may choose how to segment longer text runs\nfor processing, or may choose to rely on higher-level applications to\nmake segmentation decisions.\n\n\n### Stage 1: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThese substitutions include those features designed to provide\nlinguistic and orthographic correctness.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tlocl\n\tccmp\n\trlig\n\t\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of\nglyphs. \n\nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs.\n\n> Note: The `ccmp` feature may perform compositions or decompositions\n> of glyph sequences that do not have a canonical decomposition\n> defined in Unicode. \n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\n\n### Stage 2: Applying typographic substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies all remaining substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThese substitutions include those features designed to provide\ntypographic consistency and correctness.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\n\trclt\n\tcalt\n\tclig\n\tliga\n\t\n\nThe `rclt` feature substitutes glyphs with contextual alternate\nforms. In general, the `rclt` feature is used to perform such\nsubstitutions that are required by the orthography of the active\nscript and language. Substitutions made by `rclt` cannot be disabled\nby application-level user interfaces.\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms. In general, the `calt` feature performs substitutions that are\nnot mandatory for orthographic correctness. However, unlike `rclt`,\nthe substitutions made by `calt` can be disabled by application-level\nuser interfaces.\n\nThe `clig` feature substitutes optional ligatures that are on by\ndefault, but which are activated only in certain\ncontexts. Substitutions made by `clig` may be disabled by\napplication-level user interfaces. \n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n\n### Stage 3: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs. In preparation for this stage, glyph sequences should be\ntagged for possible application of <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\n\tcurs\n\tdist\n\tkern\n\tmark\n\tmkmk\n\nThe `curs` feature perform cursive positioning. Each glyph has an\nentry point and exit point; the `curs` feature positions glyphs so\nthat the entry point of the current glyph meets the exit point of the\npreceding glyph.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software kerning features, if\nsuch features are optional.\n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\nThe `mark` feature positions marks with respect to base glyphs.\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n<!---\ncollect features\noverride features\ndata create\ndata destroy\npreprocess text\npostprocess glyphs\nnormalization mode default\ndecompose\ncompose\nsetup masks\ndisable otl\nreorder marks\nzero width marks by gdef late\nfallback position\n--->\n"
  },
  {
    "path": "opentype-shaping-devanagari.md",
    "content": "```{include} /_global.md\n```\n\n# Devanagari shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Devanagari script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Devanagari character tables](#devanagari-character-tables)\n  - [The `<dev2>` shaping model](#the-dev2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<deva>` shaping model](#the-deva-shaping-model)\n      - [Distinctions from `<dev2>`](#distinctions-from-dev2)\n      - [Advice for handling fonts with `<deva>` features only](#advice-for-handling-fonts-with-deva-features-only)\n      - [Advice for handling text runs composed in `<deva>` format](#advice-for-handling-text-runs-composed-in-deva-format)\n\n\n## General information ##\n\nThe Devanagari script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the North Indic subgroup, in which\nsequences of adjacent consonants are often represented as conjuncts.\n\nThe Devanagari script is used to write multiple languages, most commonly\nHindi, Marathi, Maithili, and Nepali. In addition, Sanskrit may be written\nin Devanagari, so Devanagari script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Devanagari script tags defined in OpenType, `<deva>`\nand `<dev2>`. The older script tag, `<deva>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<dev2>`\nshaping model. However, if a font is encountered that supports only\n`<deva>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\nThe term \"matra\" is also used to refer to the headline above most\nDevanagari letters. To avoid ambiguity, the term **headline** is\nused in most Unicode and OpenType shaping documents.\n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. \n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. \n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Devanagari text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nDevanagari glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Devanagari character tables ###\n\nSeparate character tables are provided for the Devanagari, Devanagari\nExtended, and Vedic Extensions block as well as for other\nmiscellaneous characters that are used in `<dev2>` text runs:\n\n  - [Devanagari character table](character-tables/character-tables-devanagari.md#devanagari-character-table)\n  - [Devanagari Extended character table](character-tables/character-tables-devanagari.md#devanagari-extended-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-devanagari.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-devanagari.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0901`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0901; Candrabindu         |\n| | | | |\n|`U+0915`   | Letter           | CONSONANT         | _null_                     | &#x0915; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are designated as _unassigned_ in\nthe _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Devanagari text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr title=\"Zero-Width Joiner\">ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. \n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\n\n\n\n## The `<dev2>` shaping model ##\n\nProcessing a run of `<dev2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Devanagari's specific shaping\ncharacteristics include: \n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_BEFORE_POST` = <samp>\"Reph\"</samp> is ordered before all post-base consonant forms.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_SUBJOINED` = Right-side matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are\n     ordered after all subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Devanagari consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Devanagari Unicode block enumerates nine modifier signs,\n> \"Inverted Candrabindu\" (`U+0900`), \"Candrabindu\" (`U+0901`),\n> \"Anusvara\" (`U+0902`), \"Visarga\" (`U+0903`), \"Avagraha\" (`U+093D`),\n> \"Udatta\" (`U+0951`), \"Anudatta\" (`U+0952`), \"Grave Accent\"\n> (`U+0953`) and \"Acute Accent\" (`U+0954`). In addition, Sanskrit text\n> written in Devanagari may include additional signs from Vedic\n> Extensions block. \n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by one of two combining\nmark-like forms. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable\n    is replaced with an above-base mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp>\n    is the only consonant in the syllable). \n    This rule is synonymous with the `REPH_MODE_IMPLICIT`\n    characteristic mentioned earlier.\n\n  - <samp>\"Halant,Ra\"</samp> sequences that occur elsewhere in the syllable may take on the\n    below-base form <samp>\"Rakaar\"</samp> .\n\t\n\t\nIn addition, <samp>\"Rra,Halant\"</samp> sequences that precede the base consonant or syllable base\nmay take on a form known as the <samp>\"eyelash Ra\"</samp> .\n\n> Note: In `<dev2>` text runs, this substitution is canonically\n> implemented as a [half form](#stage-3-step-9-half). See the [`<deva>`\n> shaping](#the-deva-shaping-model) section for a discussion of the\n> \"eyelash Ra\" implementation that was used in the `<deva>` model.\n\n<samp>\"Reph\"</samp> and <samp>\"Rakaar\"</samp> characters must be reordered after the\nsyllable-identification stage is complete. \n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Devanagari script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Devanagari syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. <samp>\"Ra\"</samp> and <samp>\"Rra\"</samp> may\n> take on special forms, depending on their position in the syllable.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nDevanagari. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to step 2.1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Devanagari does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nDevanagari includes one below-base consonant form.\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) and <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial\n    position) will take on the <samp>\"Rakaar\"</samp> form. \n\t\n> Note: the sequence <samp>\"Rra,Halant\"</samp> (occurring before the base\n> consonant) will take on the <samp>\"eyelash Ra\"</samp> special form. However, this\n> special form is not a below-base form. Instead, it is canonically\n> defined as belonging to the half-form substitutions, so it is\n> addressed by the `half` feature in stage 3, step 9, and is not\n> addressed in this step.\n\n> Note: Because Devanagari employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Step 2.5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any two-part dependent vowels (matras) must be decomposed\ninto their left-side and right-side components. \n\nDevanagari does not have any two-part dependent vowels; this step is\nlisted here because it is part of the general processing scheme for\nshaping Indic scripts.\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAbove-base, right-side, and below-base dependent-vowel (matra) signs\nmust be tagged with `POS_AFTER_SUBJOINED`.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign_\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nDevanagari includes one below-base consonant form.\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) and <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial\n    position) will take on the <samp>\"Rakaar\"</samp> form. \n\t\n> Note: the sequence <samp>\"Rra,Halant\"</samp> (occurring before the base\n> consonant) will take on the <samp>\"eyelash Ra\"</samp> special form. However, this\n> special form is not a below-base form. Instead, it is canonically\n> defined as belonging to the half-form substitutions, so it is\n> addressed by the `half` feature in stage 3, step 9, and is not\n> addressed in this step.\n\n> Note: Because Devanagari employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in 2.1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<dev2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `rkrf` should match <samp>\"_Consonant_,Halant,Ra\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,Ra\"</samp> in post-base positions and\n           <samp>\"Ra,Halant\"</samp> in non-initial pre-base positions \n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `vatu` should match <samp>\"_Consonant_,Halant,Ra\"</samp> sequences\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf \n\tpref (not used in Devanagari)\n\tblwf \n\tabvf (not used in Devanagari)\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar (not used in Devanagari)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n\n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta composition](/images/devanagari/devanagari-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} devanagari-nukt\n```\n\n\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![KSsa ligation](/images/devanagari/devanagari-akhn-kssa.svg \"KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-akhn-kssa}\n\nKSsa ligation\n:::\n\n```{svg-color-toggle-button} devanagari-akhn-kssa\n```\n\n\n:::{figure-md}\n![JNya ligation](/images/devanagari/devanagari-akhn-jnya.svg \"JNya ligation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-akhn-jnya}\n\nJNya ligation\n:::\n\n```{svg-color-toggle-button} devanagari-akhn-jnya\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n:::{figure-md}\n![Reph composition](/images/devanagari/devanagari-rphf.svg \"Reph composition\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-rphf}\n\nReph composition\n:::\n\n```{svg-color-toggle-button} devanagari-rphf\n```\n\n\t\n#### Stage 3, step 5: rkrf ####\n\nThe `rkrf` feature replaces <samp>\"_Consonant_,Halant,Ra\"</samp> sequences with the\n<samp>\"Rakaar\"</samp>-ligature form of the consonant glyph.\n\n\n  - The context defined for a `rkrf` feature is:\n\n:::{table} `rkrf` feature context\n    \n| Backtrack           | Matching sequence     | Lookahead     |\n|:--------------------|:----------------------|:--------------|\n| `_consonant_`(full) | `_halant_`,\"Ra\"(full) | _none_        |\n:::\n\n\n:::{figure-md}\n![Rakaar composition](/images/devanagari/devanagari-rkrf.svg \"Rakaar composition\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-rkrf}\n\nRakaar composition\n:::\n\n```{svg-color-toggle-button} devanagari-rkrf\n```\n\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Devanagari.\n\n<!--- 3.5: The `pref` feature replaces pre-base-consonant glyphs with -->\n<!--any special forms. --->\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Devanagari includes one below-base consonant\nform:\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) or <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial position) will\n    take on the <samp>\"Rakaar\"</samp> form.\n\t\nIf the active font contains ligatures for the consonant adjacent to\nthe <samp>\"Halant\"</samp> (i.e., <samp>\"_Consonant_,Halant,Ra\"</samp>), then that ligature is\nnormally applied with the `rkrf` feature in step 3.5. The `blwf`\nfeature allows the <samp>\"Ra\"</samp> to be substituted with a standalone <samp>\"Rakaar\"</samp>\nmark, to work with all consonants that do not have a `rkrf` ligature\nin the font.\n\nBecause Devanagari incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n:::{figure-md}\n![Below-base form](/images/devanagari/devanagari-blwf.svg \"Below-base form\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-blwf}\n\nBelow-base form\n:::\n\n```{svg-color-toggle-button} devanagari-blwf\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Devanagari.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n<samp>\"_Consonant_\"</samp>.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are three exceptions to the default behavior, for which\nthe shaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - Non-initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged\n    for the `rkrf` or `blwf` features earlier, must not be flagged for\n    potential `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n:::{figure-md}\n![Half-form formation](/images/devanagari/devanagari-half.svg \"Half-form formation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-half}\n\nHalf-form formation\n:::\n\n```{svg-color-toggle-button} devanagari-half\n```\n\n\nIn addition, the sequence <samp>\"Rra,Halant\"</samp> (occurring before the base\nconsonant or syllable base) will take on the <samp>\"eyelash Ra\"</samp> form. Because this\nsubstitution is defined as the canonical half form of <samp>\"Rra\"</samp> in `<dev2>`, the\nshaping engine does not need to implement any special handling to\nsupport it. \n\n:::{figure-md}\n![Eyelash Ra formation](/images/devanagari/devanagari-eyelash-ra.svg \"Eyelash Ra formation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-eyelash-ra}\n\nEyelash Ra formation\n:::\n\n```{svg-color-toggle-button} devanagari-eyelash-ra\n```\n\n\n#### Stage 3, step 10: pstf ####\n\n> This feature is not used in Devanagari.\n\n\n#### Stage 3, step 11: vatu ####\n\nThe `vatu` feature replaces certain sequences with \"Vattu variant\"\nforms. \n\n\"Vattu variants\" are formed from glyphs followed by <samp>\"Rakaar\"</samp>\n(the below-base form of <samp>\"Ra\"</samp>); therefore, this feature must be applied after\nthe `blwf` feature.\n\n:::{figure-md}\n![Vattu ligation](/images/devanagari/devanagari-vatu.svg \"Vattu ligation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-vatu}\n\nVattu ligation\n:::\n\n```{svg-color-toggle-button} devanagari-vatu\n```\n\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<dev2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n:::{figure-md}\n![Conjunct ligation](/images/devanagari/devanagari-cjct.svg \"Conjunct ligation\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-cjct}\n\nConjunct ligation\n:::\n\n```{svg-color-toggle-button} devanagari-cjct\n```\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Devanagari.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra positioning](/images/devanagari/devanagari-matra-position.svg \"Pre-base matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-matra-position}\n\nPre-base matra positioning\n:::\n\n```{svg-color-toggle-button} devanagari-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Devanagari incorporates the `REPH_POS_BEFORE_POST`\nshaping characteristic, this final position is defined to be\nimmediately before any independent post-base consonant forms (meaning\nthe first post-base consonant that has not formed a ligature with the\nsyllable base).\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n<!---\n\n  - Find the first explicit <samp>\"Halant\"</samp> between the first post-Reph\n    consonant and the last main consonant. Move the <samp>\"Reph\"</samp> to the\n    position immediately after this <samp>\"Halant\"</samp>.\n\t- If a zero-width joiner (<abbr title=\"Zero-Width Joiner\">ZWJ</abbr>) or a zero-width non-joiner (<abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>)\n      follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n      immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n--->\n\n  - Starting at the first post-<samp>\"Reph\"</samp> consonant, search forward looking\n    for the first explicit <samp>\"Halant\"</samp>, ending the search when the base\n    consonant is encountered. If such an explicit <samp>\"Halant\"</samp> is found,\n    move the <samp>\"Reph\"</samp> to the position immediately after this\n    <samp>\"Halant\"</samp>.\n\t  * If a zero-width joiner (<abbr>ZWJ</abbr>) or a zero-width non-joiner (<abbr>ZWNJ</abbr>)\n        follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n        immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>. This will be the final\n        <samp>\"Reph\"</samp> position. \n\t  * If no <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> follows this <samp>\"Halant\"</samp>, leave the <samp>\"Reph\"</samp> in\n        its position immediately after the <samp>\"Halant\"</samp>. This will be the\n        final <samp>\"Reph\"</samp> position. \n  - If no such explicit <samp>\"Halant\"</samp> is found in the previous step, find\n    the first post-base consonant that has not formed a ligature with\n    the base consonant. If such a non-ligated post-base consonant is\n    found, move the <samp>\"Reph\"</samp> to the position immediately before the\n    non-ligated post-base consonant. This will be the final <samp>\"Reph\"</samp>\n    position.\n  - If no such non-ligated post-base consonant is found in the\n    previous step, move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Devanagari incorporates the\n    > `REPH_POS_BEFORE_POST` shaping characteristic, this means\n    > any positioning tag of `POS_POSTBASE_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_POSTBASE_CONSONANT`.\n  - If no other location has been located in the previous steps, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n\n:::{figure-md}\n![Reph positioning](/images/devanagari/devanagari-reph-position.svg \"Reph positioning\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-reph-position}\n\nReph positioning\n:::\n\n```{svg-color-toggle-button} devanagari-reph-position\n```\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \n  \n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nDevanagari does not use the `init` feature, so this step will\ninvolve no work when processing `<dev2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Devanagari)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Devanagari.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![Pre-base substitution](/images/devanagari/devanagari-pres.svg \"Pre-base substitution\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-pres}\n\nPre-base substitution\n:::\n\n```{svg-color-toggle-button} devanagari-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base substitution](/images/devanagari/devanagari-abvs.svg \"Above-base substitution\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-abvs}\n\nAbove-base substitution\n:::\n\n```{svg-color-toggle-button} devanagari-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or syllable bases that\nare adjacent to the below-base-consonant form <samp>\"Rakaar\"</samp> with contextual\nligatures.\n\n:::{figure-md}\n![Below-base substitution](/images/devanagari/devanagari-blws.svg \"Below-base substitution\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-blws}\n\nBelow-base substitution\n:::\n\n```{svg-color-toggle-button} devanagari-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base substitution](/images/devanagari/devanagari-psts.svg \"Post-base substitution\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-psts}\n\nPost-base substitution\n:::\n\n```{svg-color-toggle-button} devanagari-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![Halant substitution](/images/devanagari/devanagari-haln.svg \"Halant substitution\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-haln}\n\nHalant substitution\n:::\n\n```{svg-color-toggle-button} devanagari-haln\n```\n\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Devanagari shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Devanagari text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Devanagari, this includes <samp>\"Reph\"</samp> in addition to\nabove-base dependent vowels (matras), diacritical marks, and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/devanagari/devanagari-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} devanagari-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Devanagari, this includes below-base dependent vowels\n(matras) and diacritical marks as well as the below-base consonant form <samp>\"Rakaar\"</samp>.\n\n:::{figure-md}\n![Below-base mark positioning](/images/devanagari/devanagari-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #devanagari-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} devanagari-blwm\n```\n\n\n\n## The `<deva>` shaping model ##\n\nThe older Devanagari script tag, `<deva>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<deva>` and some users may still have documents that were written to\ntake advantage of `<deva>` shaping.\n\n### Distinctions from `<dev2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<deva>` to\n`<dev2>`. \n\n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<deva>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<dev2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<deva>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic — with a single exception made for non-syllable-initial\n<samp>\"Ra,Halant\"</samp>.\n\nIn other words, a non-syllable-initial <samp>\"Ra,Halant\"</samp> sequence would\ntrigger a below-base form substitution, but all other below-base form\nsubstitutions were applied only to consonants after the base\nconsonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<deva>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\nFinally, in `<deva>` text, the <samp>\"eyelash Ra\"</samp> form was encoded as the\nsequence <samp>\"Ra,Halant,ZWJ\"</samp>. \n\nIn `<dev2>`, the required encoding for <samp>\"eyelash Ra\"</samp> is now\n<samp>\"Rra,Halant\"</samp>, and the substitution is implemented using the `half`\nfeature of <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n\n### Advice for handling fonts with `<deva>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<deva>` shaping model.\n\n### Advice for handling text runs composed in `<deva>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<deva>` script tag and it is known that the font in use supports\nonly the `<dev2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<deva>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-emoji.md",
    "content": "# Emoji shaping in OpenType #\n\nThis document details the default shaping procedure needed to shape\nemoji sequences.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Normalization](#normalization)\n  - [Bidirectionality](#bidirectionality)\n  - [Sequence identification](#sequence-identification)\n    - [Presentation sequences](#presentation-sequences)\n    - [Modifier sequences](#modifier-sequences)\n    - [Regional Indicator flag sequences](#regional-indicator-flag-sequences)\n    - [Tag flag sequences](#tag-flag-sequences)\n    - [Keycap sequences](#keycap-sequences)\n    - [Zero-Width Joiner sequences](#zwj-sequences)\n      - [<abbr>ZWJ</abbr> hair sequences](#zwj-hair-sequences)\n      - [<abbr>ZWJ</abbr> gendered person sequences](#zwj-gendered-person-sequences)\n      - [<abbr>ZWJ</abbr> multi-person group sequences](#zwj-multi-person-group-sequences)\n      - [<abbr>ZWJ</abbr> role sequences](#zwj-role-sequences)\n      - [<abbr>ZWJ</abbr> color sequences](#zwj-color-sequences)\n      - [<abbr>ZWJ</abbr> directionality sequences](#zwj-directionality-sequences)\n      - [<abbr>ZWJ</abbr> additional sequences](#zwj-additional-sequences)\n    - [Other sequences and ligatures](#other-sequences-and-ligatures)\n  - [Feature interaction in sequences](#feature-interaction-in-sequences)\n  - [Emoji sets](#emoji-sets)\n  - [The default shaping model](#the-default-shaping-model)\n  \n  \n  \n## General information ##\n\nThe emoji OpenType shaping model is used for correctly displaying\nsequences from the Emoji block in Unicode as well as for numerous\nemoji codepoints found in other blocks.\n\nEmoji codepoints originated from a variety of pre-Unicode standards,\nincluding mobile-phone carriers in Japan, from typographic characters\nsets such as Zapf Dingbats and Wingdings, and from various symbols in\ncommon usage.\n\nEmoji shaping follows the default OpenType shaping model used for\nscripts that are considered _non-complex_ from the shaper's\nperspective. However, emoji fonts typically use <abbr title=\"Glyph Substitution table\">GSUB</abbr> tables to\nimplement a variety of OpenType smart features, including several\nclasses of ligature, contextual alternates, or variant forms to\nsupport emoji sequences.\n\nIn addition to standalone image glyphs, emoji shaping is also used to\ndisplay flag sequences and \"keycap\" sequences, both of which involve a\ncombination of emoji and non-emoji codepoints in order.\n\nThe default emoji glyph for a given codepoint may be substituted by\nthe addition of selectors, modifiers, or joiners after the emoji\ncodepoint.\n\nMany of these emoji sequences carry important semantic meaning,\nsuch as specifying gender, skin tone, object colors, and\ndirections. Shaping engines should therefore make a best effort to\ncorrectly identify and display these sequences.\n\nFallback presentation is possible for some emoji sequences by\ndisplaying the sequence of default emoji glyphs for the\ncodepoints. For other emoji sequences, however, the most appropriate\nfallback approach is less clearly defined and may vary between\nimplementations.\n\nEmoji glyphs may be stored in any of several color formats, or in any\nof the monochrome Bézier formats typically used for standard text\ncodepoints. Correctly retrieving and displaying the glyph data for\nthe format used by the active font is outside the scope of this\ndocument.\n\n> Note: \"shortcut codes\" for emoji like `:smile:` are text mark-up\n> and are _not_ handled by OpenType shaping. The set of shortcut codes\n> supported by any particular application is specific to that\n> application alone.\n>\n> Text-processing stacks typically support a set of shortcut codes\n> that includes Unicode's official `Short_Name` property from the <abbr title=\"Common Locale Data Repository\">CLDR</abbr>\n> database, plus additional short codes, but the shortcut-code mapping\n> is not otherwise linked to Unicode data.\n\nRuns of emoji might be tagged with the `<Zsye>` or `<Zsym>` script\nsubtags, or with the `-em-emoji`, `-em-text`, or `-em-default` locale\nextensions. However, these subtags and extensions are primarily\nintended to control which presentation form is preferred by the\napplication, and must not be relied on for the purpose of identifying\nemoji.\n\n\n\n## Terminology ##\n\nA codepoint is considered an **emoji** only if it has the `Emoji`\nproperty in the Unicode Character Database (<abbr>UCD</abbr>). Although many\ncodepoints that have this property are pictographic in nature, some\ncodepoints that are pictographic do not have the `Emoji` property\n(such as most chess, playing-card, and game-piece symbols), and some\ncodepoints that do have the `Emoji` property show typographic\ncharacters rather than pictographic images.\n\nAll emoji codepoints — as well as several non-emoji codepoints — have\nthe `Extended_Pictographic` property. When a non-emoji codepoint has\nthe `Extended_Pictographic` property, this indicates that future\nrevisions of Unicode may incorporate the codepoint in a valid emoji\nsequence, or may (for a currently-unassigned codepoint) assign an\nemoji character to the codepoint.\n\nThe emoji codepoints also include two distinct sets of alphanumeric\ncharacter codepoints that are used to implement specific substitution\nsequences.\n\nThe **regional indicator** set includes the 26 lower-case\nBasic Latin letters (<samp>\"a\"</samp> to <samp>\"z\"</samp>), which are used to support the\npredefined set of regional flags. The regional indicator set is found\nwithin the Enclosed Alphanumeric Supplement block of Unicode.\n\nThe **tag character** set includes codepoints that correspond to the\nprintable characters in the ASCII set, as well as an <samp>\"End\"</samp> control\ntag. The tag characters are used to support a more general mechanism\nfor local and sub-national flags that are not covered by the\npredefined regional-indicator flag set. The tag characters set is\nfound within the Tags block of Unicode.\n\n**Presentation style** describes whether an emoji codepoint is\nshown in emoji style (for example, with a full-color bitmap or <abbr title=\"Scalable Vector Graphics\">SVG</abbr>\nglyph) or text style (such as a monochrome, Bézier glyph). Every emoji\ncodepoint defaults to either emoji-style or text-style\npresentation.\n\nAn emoji codepoint might be followed by a **presentation selector**.\nThis selector requests that either emoji-style or text-style be used\nfor the preceding emoji codepoint, potentially overriding that\ncodepoint's default. There are two presentation selectors:\n\n  - `Variation Selector 15` (VS15, `U+FE0E`) requests text\n    presentation style.\n  - `Variation Selector 16` (VS16, `U+FE0F`) requests emoji\n    presentation style.\n\n:::{figure-md}\n![Text presentation-style selector](/images/emoji/text-presentation.png \"Text presentation-style selector\")\n\nText presentation-style selector\n:::\n\n\n:::{figure-md}\n![Emoji presentation-style selector](/images/emoji/emoji-presentation.png \"Emoji presentation-style selector\"){title=\"Testing\"}\n\nEmoji presentation-style selector\n:::\n\n\nAn emoji codepoint might also be followed by an emoji\n**modifier**. This modifier requests an alternate version of the emoji\nglyph. Currently, there are five emoji modifiers defined, all of which\nare assigned to a skin-tone designation from the Fitzpatrick scale:\n\n  - `U+1F3FB` \"Light skin tone\"\n  - `U+1F3FC` \"Medium-light skin tone\"\n  - `U+1F3FD` \"Medium skin tone\"\n  - `U+1F3FE` \"Medium-dark skin tone\"\n  - `U+1F3FF` \"Dark skin tone\"\n\n\nEmoji **sequences** consist of one or more emoji codepoints,\noptionally followed by presentation selectors, modifiers, or other\nspecial characters. A font can implement custom ligatures for any\nsequence of emoji. However, Unicode also designates specific sequences\nthat should be supported. These sequences can involve three special\nnon-printing codepoints in addition to the selectors and modifiers\nmentioned above:\n\n  - The Combining Enclosing Keycap (<abbr>CEK</abbr>, `U+20E3`) is used to form\n    **keycap** sequences corresponding to telephone keypad keys.\n\n  - The Cancel Tag (`U+E007F`) is used to form tag-based flag\n    sequences.\n\n  - The Zero-Width Joiner (<abbr>ZWJ</abbr>, `U+200D`) is used to form emoji\n    sequences for multi-person groups, gendered forms, hair-color\n    variants, and directionality.\n\n\n\n## Normalization ##\n\nEmoji sequences are not generally affected by Unicode or OpenType\nnormalization. However, Unicode does specify an order to be used when\nrepresenting <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>-using emoji sequences.\n\nThe correct order should be:\n\n    Base emoji codepoint\n\tEmoji modifier OR Emoji presentation selector\n\tHair subsequence\n\tColor subsequence\n\tGender-sign or object subsequence\n\tDirectionality indicator\n\n\nAlthough this ordering is not designated a Unicode normalization form,\nshaping engine implementers may find it a useful target if attempting\nto correct invalid mis-ordered emoji <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequences.\n\nShaping engines should also note that the `Emoji` and\n`Extended_Pictographic` properties may require tracking in any Unicode\nnormalization routines.\n\nThe `Emoji` property of a codepoint can be unintentionally lost when\ncertain string transformations are performed. For example, the\nupper-case versions of the Circled Latin Letters have the `Emoji`\nproperty, but the lower-case version of the Circled Latin do\nnot. Therefore, a case-transformation rule must take care not to\nunintentionally break the desired output by losing the property.\n\nThe `Extended_Pictographic` property of a codepoint should be tracked\nbecause it is set on several non-emoji codepoints that may be updated\nto have the `Emoji` property in a future release of Unicode.\n\n\n\n## Bidirectionality ##\n\nMost emoji sequences are defined to be neutral for the purpose of\nbidirectionality segmenting and handling.\n\nHowever, the Regional Indicator flag sequences are defined to be\nleft-to-right only, overriding any levels of bidirectional embedding.\n\n\n\n## Sequence identification ##\n\nThere are six varieties of emoji sequence defined by Unicode:\n\n1. Presentation sequences\n2. Modifier sequences\n3. Regional Indicator flag sequences\n4. Tag flag sequences\n5. Keycap sequences\n6. Zero-width joiner (<abbr>ZWJ</abbr>) sequences\n\n> Note: The <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence variety incorporates several subsets, but all\n> of the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequences are implemented using the same mechanism.\n\nThe set of sequences includes various mechanisms defined at different\ntimes by either Unicode itself or by legacy encoding standards. In\nsome cases, an older mechanism (such as the Regional Indicator\nmechanism used for national flags) has been superseded by a newer,\nmore flexible mechanism intended to permit emoji vendors to provide\nsupport for a large set of new representations or emoji variants\nwithout requiring Unicode to define new codepoints for every possible\npermutation. Nevertheless, shaping-engine implementers should expect\nto encounter any or all of the defined sequences.\n\nThis set includes the major categories of sequences that shaping\nengines are likely to encounter and that can convey important\ncontextual information to users. Note, however, that fonts may\nimplement additional sequences via ligature substitution or other\nexisting mechanisms.\n\nEach of the six sequence varieties can also be interpreted as a\ndifferent module of overall \"emoji sequence support\" for a\nshaping-engine implementation. For example, support for Regional\nIndicator flag sequences is distinct from support for Keycap\nsequences. For convenience, in this document, the sequence varieties\nare listed in an order that roughly approximates their complexity, but\nthis ordering is not definitive.\n\nSequences should be identified by examining the run and matching\ncharacters, based on their categorization, using regular expressions. \n\nThe following general-purpose identification classes can be used to\nmatch emoji sequences in regular expressions.\n\n```markdown\n_emoji_             = `EMOJI`\n_modifier_          = \"U+1F3FB\" | \"U+1F3FC\" | \"U+1F3FD\" | \"U+1F3FE\" | \"U+1F3FF\"\n_presentation_      = `VS15` | `VS16`\n_zwj_               = `ZWJ`\n_cek_               = `CEK`\n_blackflag_         = \"U+1F3F4\"\n_key_               = \"#\" | \"*\" | [\"0\"..\"9\"]\n_color_             = \"U+2B1B\" | \"U+2B1C\" | \"U+1F7E5\" | \"U+1F7E6\" | \"U+1F7E7\" | \"U+1F7E8\" | \"U+1F7E9\" | \"U+1F7EA\" | \"U+1F7EB\"\n_multipersongroup_ = \"U+1F91D\" | \"U+1F46F\" | \"U+1F93C\" | \"U+1F46B\" | \"U+1F46C\" | \"U+1F46D\" | \"U+1F48F\" | \"U+1F491\" | \"U+1F46A\"\n_gendersign_        = \"U+2640\" | \"U+2642\"\n_genderperson_      = \"U+1F468\" | \"U+1F469\" | \"U+1F9D1\"\n_hairstyle_         = \"U+1F9B0\" | \"U+1F9B1\" | \"U+1F9B2\" | \"U+1F9B3\"\n_direction_         = \"U+2B05\" | \"U+27A1\"\n_regionalindicator_ = [\"U+1F1E6\"..\"U+1F1FF\"]\n_tagchar_           = `TAG_CHARACTER`\n_endtag_            = \"U+E007F\"\n```\n<!---\n\n_adult_               = `U+1F468` | `U+1F469`\n_child_               = `U+1F466` | `U+1F467`\n_familymember_              = _adult_ | _child_\n\n_family_    = _adult_{1,2} _child_{0,2}\n\n - this doesn't work; perhaps need to redefine the above as \"genderperson\" and \n   the OLD genderperson as something different. Coherent haming is going \n   to be a challenge on that front.\n--->\n\nThe expressions below use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n### Presentation sequences ###\n\nA presentation sequence is used to request a specific presentation\nstyle (\"text\" or \"emoji\"), potentially overriding the default\npresentation style defined for the codepoint by Unicode.\n\n:::{figure-md}\n![Requesting emoji presentation style](/images/emoji/emoji-pres-sequence.png \"Requesting emoji presentation style\")\n\nRequesting emoji presentation style\n:::\n\n\n:::{figure-md}\n![Requesting text presentation style](/images/emoji/text-pres-sequence.png \"Requesting text presentation style\")\n\nRequesting text presentation style\n:::\n\n\nThe active emoji font, however, might not contain a glyph for the\npresentation style requested in the sequence. In particular, it is not\ncommon for emoji fonts to include text-presentation glyphs for\ncodepoints that default to the emoji-presentation style.\n\nIn these instances, the text rendering stack should select a fallback\nfont that does contain the glyph requested by the presentation\nsequence. Strategies for identifying appropriate fallback fonts are\nbeyond the scope of this document.\n\nA standalone presentation sequence must match:\n\n```markdown\n_emoji_ _presentation_\n```\n\nAlthough standalone presentation sequences can occur, note that\npresentation sequences also occur within longer emoji sequences.\n\n\n\n### Modifier sequences ###\n\nA modifier sequence is used to request an alternate glyph for an emoji\ncodepoint. \n\nCurrently, there are five emoji-modifier codepoints defined by\nUnicode. Each corresponds to a different human skin-tone based on the\nFitzpatrick scale.\n\n:::{figure-md}\n![Fitzpatrick 2](/images/emoji/fitzpatrick-2.png \"Fitzpatrick 2\")\n\nModifier for Fitzpatrick scale skin-tone 2\n:::\n\n:::{figure-md}\n![Fitzpatrick 3](/images/emoji/fitzpatrick-3.png \"Fitzpatrick 3\")\n\nModifier for Fitzpatrick scale skin-tone 3\n:::\n\n:::{figure-md}\n![Fitzpatrick 4](/images/emoji/fitzpatrick-4.png \"Fitzpatrick 4\")\n\nModifier for Fitzpatrick scale skin-tone 4\n:::\n\n:::{figure-md}\n![Fitzpatrick 5](/images/emoji/fitzpatrick-5.png \"Fitzpatrick 5\")\n\nModifier for Fitzpatrick scale skin-tone 5\n:::\n\n:::{figure-md}\n![Fitzpatrick 6](/images/emoji/fitzpatrick-6.png \"Fitzpatrick 6\")\n\nModifier for Fitzpatrick scale skin-tone 6\n:::\n\n\nA modifier sequence must match:\n\n```markdown\n_emojimodified_ = _emoji_ _modifier_\n```\n\nFonts are expected to implement modifier sequences for emoji\ncodepoints that depict a single human being, and are expected not to\nimplement modifier sequences for other emoji codepoints.\n\n> Note: Most emoji sequences that depict multiple human beings are\n> modified using the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> mechanisms described later, and not via this\n> mechanism.\n> \n> However, there are a small number of codepoints that depict groups\n> of human beings in a standalone codepoint and can be modified with a\n> single modifier. They are summarized in the table at the [feature\n> interaction in sequences](#feature-interaction-in-sequences)\n> section.\n>\n> Note, also that there are emoji codepoints depicting beings that are\n> ambiguous in regard to their humanity, such as `U+1F9DB`,\n> \"Vampire\". Shaping engines should not assume that these codepoints\n> are unable to support a modifier.\n\n:::{figure-md}\n![Modifier sequence](/images/emoji/modifier-sequence.png \"Modifier sequence\")\n\nModifier sequence\n:::\n\n\nThe fallback for a modifier sequence is the generic, unmodified\nemoji followed by an emoji representing the skin-tone requested.\n\n:::{figure-md}\n![Modifier sequence fallback](/images/emoji/modifier-sequence-fallback.png \"Modifier sequence fallback\")\n\nModifier sequence fallback\n:::\n\n\nModifier sequences use emoji presentation style by default, and cannot\ninclude a presentation selector. However, an implementation may choose\nto display text-presentation versions of sequences if emoji\npresentation style is not possible in the environment.\n\nAlthough standalone modifier sequences occur, note that modifier\nsequences can also occur within longer emoji sequences.\n\n\n\n### Regional Indicator flag sequences ###\n\nA Regional Indicator flag sequence is used to request a flag\nemoji. All Regional Indicator flag sequences are two codepoints long,\nusing codepoints from the `REGIONAL_INDICATOR` alphabetical set.\n\nA Regional Indicator flag sequence must match:\n\n```markdown\n_regionalindicator_ _regionalindicator_\n```\n\nIn addition, the only two-codepoint sequences that are considered\nvalid Regional Indicator flag sequences are those that correspond to\nthe `unicode_region_subtag` field in the <abbr title=\"Common Locale Data Repository\">CLDR</abbr> database.\n\nThe typical emoji implementation of such a sequence in an image of a\nflag for the region. However, emoji fonts may choose to represent the\nregion through some other visual means (for example, a regional symbol\nor map image). Similarly, where there is more than one possible flag\nfor a region, Unicode does not specify any particular visual\nrepresentation.\n\nSome historical region subtags have been designated as deprecated (for\nexample, <samp>\"East Germany\"</samp> and <samp>\"West Germany\"</samp>). Emoji fonts are not\nexpected to support these deprecated subtags. However, if they\nencountered in a text run and are supported in the active font,\nshaping engines should deal with the situation gracefully, without\noffering guarantees of support.\n\n:::{figure-md}\n![Regional Indicator flag sequence](/images/emoji/regional-indicator-flag-sequence-un.png \"Regional Indicator flag sequence\")\n\nRegional Indicator flag sequence\n:::\n\n\nRegional Indicator flag sequences use emoji presentation by default,\nand cannot include a presentation selector.  However, an\nimplementation may choose to display text-presentation versions of\nsequences if emoji presentation style is not possible in the\nenvironment.\n\n:::{figure-md}\n![Regional Indicator flag sequence fallback](/images/emoji/regional-indicator-flag-sequence-un-fallback.png \"Regional Indicator flag sequence fallback\")\n\nRegional Indicator flag sequence fallback\n:::\n\n\nRegional Indicator flag sequences only occur in standalone form.\n\n> Note: The Regional Indicator flag sequences are defined to always be\n> interpreted left-to-right (<abbr>LTR</abbr>) for the purpose of\n> bidirectionality. This behavior differs from that of other emoji\n> sequences, which are neutral in regard to bidirectionality.\n>\n> For example, a Regional Indicator sequence <samp>\"RI_U, RI_A\"</samp> should result\n> in a flag for Ukraine (<samp>\"UA\"</samp>), even if it occurs within a run of\n> right-to-left text. Reversing the sequence to result in a flag for\n> Australia (<samp>\"AU\"</samp>) is incorrect.\n\n\n\n### Tag flag sequences ###\n\nA Tag flag sequence is used to request a flag emoji for any flag not\ndefined by the Regional Indicator flag sequence mechanism.\n\nA Tag flag sequence must match:\n\n```markdown\n_blackflag_ _tagchar_+ _endtag_\n```\n\nThe codepoints in the `TAG_CHARACTER` set come from the \"Tags\" block\nin Unicode. At present, the set of allowable tags is defined as the\nrange `[U+E0020..U+E007E]`, which includes tags for space, upper- and\nlower-case basic Latin alphabetic letters, numerals, and several\nsymbols. However, Unicode also notes that the upper-case alphabetic\ntags are not currently used.\n\nTag sequences must end with <samp>\"Cancel Tag\"</samp> (`U+E007F`).\n\n> Note: Despite the official name \"Cancel Tag\", this codepoint\n> terminates valid tag sequences, rather than negating them.\n\n:::{figure-md}\n![Tag flag sequence](/images/emoji/tag-flag-sequence-wales.png \"Tag flag sequence\")\n\nTag flag sequence\n:::\n\n\nTag flag sequences only occur in standalone form.\n\n\n### Keycap sequences ###\n\nA Keycap sequence is used to request an emoji that depicts a\ntelephone-keypad button.\n\nA Keycap sequence must match:\n\n```markdown\n_key_ _presentation_ _cek_\n```\n\n:::{figure-md}\n![Keycap sequence](/images/emoji/keycap-sequence.png \"Keycap sequence\")\n\nKeycap sequence\n:::\n\n\nKeycap sequences only occur in standalone form.\n\n\n\n### <abbr>ZWJ</abbr> sequences ###\n\nA Zero-Width Joiner (<abbr>ZWJ</abbr>) sequence can be used to request specific\nvariants of an emoji glyph or to request the combined form of a\nsequence of emoji glyphs.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> codepoint itself is invisible, users will expect <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>\nsequences to fall back gracefully as sequences of standalone emoji\nglyphs that convey the original meaning. For example, a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>\nmulti-person group sequence would be rendered as a single multi-person\nemoji glyph if one is available in the active font, but would fall\nback to a set of individual-person emoji glyphs.\n\n\n\n\n#### <abbr>ZWJ</abbr> hair sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> hair sequence is used to request a specific hairstyle version of\nan emoji codepoint that depicts a single human being.\n\n:::{figure-md}\n![ZWJ hairstyle sequence](/images/emoji/hairstyle-sequence.png \"ZWJ hairstyle sequence\")\n\nZWJ hairstyle sequence\n:::\n\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> hair sequence must match:\n\n```markdown\n(_emoji_ | _emojimodified_) _zwj_ _hairstyle_\n```\n\nCurrently, four hairstyle modifier codepoints are defined:\n\n  - `U+1F9B0` \"Red or ginger hair\"\n  - `U+1F9B1` \"Curly hair\"\n  - `U+1F9B2` \"Bald\"\n  - `U+1F9B3` \"White hair\"\n\nThe set of hairstyle sequences allowed has been chosen to enable\ndepictions of distinct properties not easily represented by the\ndefaults of the fallback glyphs. By default, the hairstyle and color\non fallback emoji is expected to be nondescript and dark.\n\nPrior to the adoption of the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>-hair-sequence mechanism, a codepoint\nspecifying \"person with blond hair\" (`U+1F471`) already existed;\ntherefore \"blond\" was not included in the set of supported hairstyle\nversions.\n\n\n#### <abbr>ZWJ</abbr> gendered person sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person sequence is used to request a specific-gendered\nversion of an emoji codepoint that depicts a single human being.\n\nEach <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person sequence is composed of an emoji that depicts\na human by default, followed by <samp>\"ZWJ\"</samp>, followed by a gender symbol,\nfollowed by <samp>\"VS16\"</samp>.\n\n:::{figure-md}\n![ZWJ gendered person sequence](/images/emoji/gendered-person-sequence.png \"ZWJ gendered person sequence\")\n\nZWJ gendered person sequence\n:::\n\n\nThe fallback for a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person sequence is a generic \"person\"\nemoji followed by a gender symbol.\n\n:::{figure-md}\n![ZWJ gendered person sequence fallback](/images/emoji/gendered-person-sequence-fallback.png \"ZWJ gendered person sequence fallback\")\n\nZWJ gendered person sequence fallback\n:::\n\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person sequence must match:\n\n```markdown\n(_emoji_ | _emojimodified_) _zwj_ _gendersign_ _VS16_\n```\n\nA small number of emoji codepoints are defined to show a single human\nbeing with a fixed gender. These codepoints cannot have their apparent\ngender modified using the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person mechanism.\n\nCurrently, this list of codepoints includes those in the table below:\n\n:::{table} Single-human emoji codepoints that do not support gendered-person modifiers\n\n| Emoji codepoint                | Gender |\n|:-------------------------------|:-------|\n| `U+1F467` Girl                 | Female |\n| `U+1F467` Boy                  | Male   |\n| `U+1F467` Woman                | Female |\n| `U+1F467` Man                  | Male   |\n| `U+1F467` Older woman          | Female |\n| `U+1F467` Older man            | Male   |\n| `U+1F467` Mrs Claus            | Female |\n| `U+1F467` Santa Claus          | Male   |\n| `U+1F467` Princess             | Female |\n| `U+1F467` Prince               | Male   |\n| `U+1F467` Woman dancing        | Female |\n| `U+1F467` Man dancing          | Male   |\n| `U+1F467` Pregnant woman       | Female |\n| `U+1F467` Breastfeeding        | Female |\n| `U+1F467` Woman with headscarf | Female |\n:::\n\n\nHowever, the list may be updated in subsequent revisions of Unicode.\n\nIn addition, emoji codepoints that depict groups of two or more human\nbeings are handled by other mechanisms, such as the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person\ngroup mechanism, and are documented in the corresponding section.\n\n> Note: The <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person sequence is not to be confused with\n> the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequence.\n>\n> In effect, both sequence types can be used to depict a human being\n> performing a task or activity, and can be used to request a specific\n> gender for the human being depicted.\n>\n> However, all of the codepoints covered by the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person\n> sequences are emoji that show a human being by default, whereas the\n> codepoints covered by the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequences begin with a generic\n> human-being emoji and append a symbol or object emoji.\n\n\n\n\n#### <abbr>ZWJ</abbr> multi-person group sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequence is used to request a multi-person\nemoji glyph. The fallback for a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequence is a\nsequence of individual-person emoji glyphs.\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequence must match:\n\n```markdown\n(_emoji_ | _emojimodified_) (_zwj_ (_emoji_ | _emojimodified_) _presentation_? ){1,3}\n```\n\nOnly a fixed number of such multi-person group sequences is\ndefined. Some of the sequences make use of specific codepoints (such\nas <samp>\"Heavy Black Heart\"</samp> or <samp>\"Kiss Mark\"</samp>).\n\nThe currently supported configurations for multi-person group emoji\nsequences are:\n  \n  - Couple with heart\n  - Couple in kiss\n  - Couple holding hands \n  - Family \n  - Shaking hands\n\n\nA potential source of confusion for these sequences is that some of\nthem appear to duplicate the content of an existing emoji codepoint,\nbut the existing emoji codepoint is typically not involved in forming\nthe corresponding <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequence.\n\nSpecifically, there are standalone emoji codepoints for \"Kiss\"\n(`U+1F48F`), two people holding hands (in three permutations:\n`U+1F46B`, `U+1F46C`, and `U+1F46D`), \"Family\" (`U+1F46A`), and\n\"Handshake\" (`U+1F91D`). The details of each of these codepoints in\nrelation to the corresponding conceptually-similar <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person\ngroup sequence are noted below.\n\nEach of the specific <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequences has a precise\ndefinition.\n\nThe \"Couple with heart\" sequence is composed of\n<samp>\"Person,ZWJ,Heavy_Black_Heart,VS16,ZWJ,Person\"</samp>, and must match:\n```markdown\n(_emoji_ | _emojimodified_) _zwj_ `U+2764` _VS16_ _zwj_ (_emoji_ | _emojimodified_)\n```\n\n:::{figure-md}\n![ZWJ multi-person heart sequence](/images/emoji/multi-person-heart-sequence.png \"ZWJ multi-person heart sequence\")\n\nZWJ multi-person heart sequence\n:::\n\n\n:::{figure-md}\n![ZWJ multi-person heart sequence with modifier](/images/emoji/multi-person-heart-skintone-sequence.png \"ZWJ multi-person heart sequence with modifier\")\n\nZWJ multi-person heart sequence with modifier\n:::\n\n\n\nThe \"Couple in kiss\" sequence is composed of\n<samp>\"Person,ZWJ,Heavy_Black_Heart,VS16,ZWJ,Kiss_Mark,ZWJ,Person\"</samp>, and must match:\n```markdown\n(_emoji_ | _emojimodified_) _zwj_ `U+2764` _VS16_ _zwj_ `U+1F48B` _zwj_ (_emoji_ | _emojimodified_)\n```\n\n:::{figure-md}\n![ZWJ multi-person kiss sequence](/images/emoji/multi-person-kiss-sequence.png \"ZWJ multi-person kiss sequence\")\n\nZWJ multi-person kiss sequence\n:::\n\n\n:::{figure-md}\n![ZWJ multi-person kiss sequence with modifier](/images/emoji/multi-person-kiss-skintone-sequence.png \"ZWJ multi-person kiss sequence with modifier\")\n\nZWJ multi-person kiss sequence with modifier\n:::\n\n\n> Note: the kiss <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence does not involve the \"Kiss\" codepoint\n> (`U+1F48F`). \n\n\nThe \"Couple holding hands\" sequence is composed of\n<samp>\"Person,ZWJ,Handshake,ZWJ,Person\"</samp>, and must match:\n```markdown\n(_emoji_ | _emojimodified_) _zwj_ `U+1F91D` _zwj_ (_emoji_ | _emojimodified_)\n```\n\n:::{figure-md}\n![ZWJ multi-person holding-hands sequence](/images/emoji/multi-person-holding-hands-sequence.png \"ZWJ multi-person holding-hands sequence\")\n\nZWJ multi-person holding-hands sequence\n:::\n\n\n:::{figure-md}\n![ZWJ multi-person holding-hands sequence with modifier](/images/emoji/multi-person-holding-hands-skintone-sequence.png \"ZWJ multi-person holding-hands sequence with modifier\")\n\nZWJ multi-person holding-hands sequence with modifier\n:::\n\n\n> Note: the couple-holding-hands <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence does not involve any of\n> the \"Man and woman holding hands\" (`U+1F46B`), \"Two men holding\n> hands\" (`U+1F46C`), or \"Two women holding hands\" (`U+1F46D`)\n> codepoints. \n\n\nThe \"Family\" sequence is composed of two-to-four individual <samp>\"Person\"</samp>\nsubsequences, each separated by a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>. Furthermore, the <samp>\"Person\"</samp>\nsubsequences must be sorted so that all adult subsequences precede all\nchild subsequences. A <samp>\"Family\"</samp> subsequence must match:\n```markdown\n(_emoji_ | _emojimodified_) (_zwj_ (_emoji_ | _emojimodified_) ){1,3}\n```\n\n> Note: The <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> \"Family\" sequence is defined to support modifiers on\n> each individual human-codepoint component of the sequence, but these\n> modified \"Family\" sequences are not currently included in the\n> Recommended For General Interchange (<abbr>RGI</abbr>) emoji set, due to the\n> number of permutations that would be added to the <abbr title=\"Recommended for General Interchange\">RGI</abbr> set as a result.\n\n:::{figure-md}\n![ZWJ multi-person family man, boy sequence](/images/emoji/multi-person-family-man-boy-sequence.png \"ZWJ multi-person family man, boy sequence\")\n\nZWJ multi-person family sequence \"man, boy\"\n:::\n\n:::{figure-md}\n![ZWJ multi-person family man, girl, girl sequence](/images/emoji/multi-person-family-man-girl-girl-sequence.png \"ZWJ multi-person family man, girl, girl sequence\")\n\nZWJ multi-person family sequence \"man, girl, girl\"\n:::\n\n:::{figure-md}\n![ZWJ multi-person family man, woman, girl sequence](/images/emoji/multi-person-family-man-woman-girl-sequence.png \"ZWJ multi-person family man, woman, girl sequence\")\n\nZWJ multi-person family sequence \"man, woman, girl\"\n:::\n\n:::{figure-md}\n![ZWJ multi-person family woman, woman, girl, boy sequence](/images/emoji/multi-person-family-woman-woman-girl-boy-sequence.png \"ZWJ multi-person family woman, woman, girl, boy sequence\")\n\nZWJ multi-person family sequence \"woman, woman, girl, boy\"\n:::\n\n\n> Note: the family <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence does not involve the \"Family\"\n> codepoint (`U+1F46A`).\n\nThe \"Shaking hands\" sequence is composed of two <samp>\"Hand\"</samp> subsequences\nseparated by a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>, and must match:\n```markdown\n`U+1FAF1` _modifier_ _zwj_ `U+1FAF2` _modifier_\n```\n\n:::{figure-md}\n![ZWJ multi-person shaking-hands sequence](/images/emoji/multi-person-shaking-hands-sequence.png \"ZWJ multi-person shaking-hands sequence\")\n\nZWJ multi-person shaking-hands sequence\n:::\n\n\n> Note: the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> \"Shaking hands\" sequence does not involve the \"Handshake\"\n> codepoint (`U+1F91D`), although the \"Handshake\" codepoint itself can\n> be followed by a single _modifier_ codepoint that, for legacy\n> reasons, serves to alter the skin tone of both of the hands depicted\n> in the handshake.\n>\n> However, the \"Handshake\" codepoint _is_ utilized in the multi-person\n> group sequence for \"Couple holding hands\".\n\n\n#### <abbr>ZWJ</abbr> role sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role (or profession) sequence is used to request an emoji\ndepicting a human being performing a task or job. Role sequences are\ncomposed of a codepoint representing a human, followed by <samp>\"ZWJ\"</samp>,\nfollowed by an emoji depicting an object or symbol that references the\ndesired profession or role.\n\n:::{figure-md}\n![ZWJ role sequence firefighter](/images/emoji/role-sequence-firefighter.png \"ZWJ role sequence 'firefighter'\")\n\nZWJ role sequence 'firefighter'\n:::\n\n\nOptionally, the sequence can be further updated by requesting a\nskin-tone modifier appended to the `_genderperson_` element.\n\n:::{figure-md}\n![ZWJ role sequence firefighter with modifier](/images/emoji/role-sequence-firefighter-skintone-6.png \"ZWJ role sequence 'firefighter' with modifier\")\n\nZWJ role sequence 'firefighter' with modifier\n:::\n\n\nIn some cases, the object or symbol depicted by the standalone emoji\nwill not be shown in the substituted emoji resulting from the\nsequence. For example, the \"factory\" codepoint (`U+1F3ED`) depicts a\nbuilding in its standalone emoji, but the \"factory worker\" sequence\ndepicts a human being outfitted for factory work, rather than\ndepicting a combination of the human being and the factory building.\n\nIn addition, some of the supported \"role\" codepoints do not use emoji\npresentation by default; for those codepoints, the emoji will be\nfollowed by a presentation selector.\n\n:::{figure-md}\n![ZWJ role sequence pilot](/images/emoji/role-sequence-pilot.png \"ZWJ role sequence 'pilot'\")\n\nZWJ role sequence 'pilot'\n:::\n\n\n:::{figure-md}\n![ZWJ role sequence pilot with modifier](/images/emoji/role-sequence-pilot-skintone-2.png \"ZWJ role sequence 'pilot' with modifier\")\n\nZWJ role sequence 'pilot' with modifier\n:::\n\n\n\nThe fallback for a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequence is a generic \"person\" emoji\nfollowed by the emoji symbolizing the task or job.\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequence must match:\n\n```markdown\n_genderperson_ _modifier_? _zwj_ _emoji_ _presentation_?\n```\n\n> Note: The <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequence is not to be confused with the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>\n> gendered person sequence.\n>\n> In effect, both sequence types can be used to depict a human being\n> performing a task or activity, and can be used to request a specific\n> gender for the human being depicted.\n>\n> However, all of the codepoints covered by the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person\n> sequences are emoji that show a human being by default, whereas the\n> codepoints covered by the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> role sequences begin with a generic\n> human-being emoji and append a symbol or object emoji.\n\n\n\n#### <abbr>ZWJ</abbr> color sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> color sequence is used to request a version of an emoji\ncodepoint depicting the base object in a specific color.\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> color sequence must match:\n\n```markdown\n_emoji_ _zwj_ _color_ _presentation_\n```\n\nCurrently, nine codepoints are defined, each of which (in isolation)\ndepicts a large square filled with the color in question.\n\n  - `U+2B1B` - \"Black Large Square\"\n  - `U+2B1C` - \"White Large Square\"\n  - `U+1F7E5` - \"Large Red Square\"\n  - `U+1F7E6` - \"Large Blue Square\"\n  - `U+1F7E7` - \"Large Orange Square\"\n  - `U+1F7E8` - \"Large Yellow Square\"\n  - `U+1F7E9` - \"Large Green Square\"\n  - `U+1F7EA` - \"Large Purple Square\"\n  - `U+1F7EB ` - \"Large Brown Square\"\n\n\n:::{figure-md}\n![ZWJ color sequence](/images/emoji/color-sequence.png \"ZWJ color sequence\")\n\nZWJ color sequence\n:::\n\n\nThe fallback for a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> color sequence is the default emoji followed by\nthe default emoji for the color codepoint (that is, the color square).\n\n\n\n#### <abbr>ZWJ</abbr> directionality sequences ####\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> directionality sequence is used to request a version of an emoji\ncodepoint facing a specific cardinal direction.\n\n\n:::{figure-md}\n![ZWJ directionality sequence](/images/emoji/zwj-directionality-sequence.png \"ZWJ directionality sequence\")\n\nZWJ directionality sequence\n:::\n\n\nA <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> directionality sequence must match:\n\n```markdown\n_emoji_ _zwj_ _direction_ _presentation_\n```\n\n#### <abbr>ZWJ</abbr> additional sequences ####\n\nIn addition to the above <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence categories, there are 13\nstandalone, but uncategorized, <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequences defined in Unicode.\n\n  - \"Heart on fire\"\n  - \"Mending heart\"\n  - \"transgender flag\"\n  - \"Rainbow flag\"\n  - \"Pirate flag\"\n  - \"Service dog\"\n  - \"Polar bear\"\n  - \"Eye in speech bubble\"\n  - \"Face exhaling\"\n  - \"Face with spiral eyes\"\n  - \"Face in clouds\"\n  - \"Mx Claus\"\n  - \"Black cat\"\n\n\n:::{figure-md}\n![ZWJ heart-on-fire sequence](/images/emoji/zwj-sequence-heart-on-fire.png \"ZWJ heart-on-fire sequence\")\n\nZWJ heart-on-fire sequence\n:::\n\n\nThese sequences currently match:\n\n```markdown\n_emoji_ _presentation_? _zwj_ _emoji_ _presentation_?\n```\n\nNote that the \"Black cat\" sequence, although it appears on this list\nof additional <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequences, has subsequently been generalized to the\n[ZWJ color sequence mechanism](#zwj-color-sequences).\n\n\n### Other sequences and ligatures ###\n\nEmoji fonts may include support for additional variants and\nsequences. For example, an emoji font might implement support for\n\"keycap\"-style emoji for alphabetical characters in addition to the\nnumbers and symbols defined above.\n\nEmoji fonts may also include many-to-one emoji substitutions that do\nnot fit into any of the above sequence varieties and, instead, behave\nmore like ligatures. For example, the sequence <samp>\"Ice Cream, Banana\"</samp>\nmight be substituted with a \"banana split\" emoji.\n\nAny such substitutions are included by the emoji font vendor at their\nown discretion, with the understanding that fallback behavior is\nunpredictable.\n\nIn all such cases, the shaping engine can make a best-effort attempt\nto support the sequences, but is not obligated to provide any\nguarantees as to their correctness.\n\n\n## Feature interaction in sequences ##\n\nAs is noted in the descriptions of <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered-person sequences and\n<abbr title=\"Zero-Width Joiner\">ZWJ</abbr> multi-person group sequences, there is potential for confusion\nwherever standalone emoji codepoints and emoji sequences overlap in\nmeaning.\n\nThis potential for confusion is compounded by the fact that the\nskin-tone modifier mechanism and the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered person mechanism\ninteract differently with standalone emoji codepoints and with emoji\nsequences.\n\nIn particular, for several of the standalone emoji codepoints, a\nsingle skin-tone modifier is permitted, which is defined to modify both\nof the human beings depicted in the emoji. For other standalone emoji\ncodepoints, only a single gender designator <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> gendered-person\nsubsequence is allowed to be appended to the codepoint, and the gender\ndesignator is defined to modify both of the human beings depicted in\nthe emoji.\n\nThe permitted combinations are summarized in the following table:\n\n\n:::{table} Defined interactions between skin-tone–modifiers and gender designators\n\n| Type       | Emoji                                   | Skin-tone-modifier | Gender depicted     |\n|:-----------|:----------------------------------------|:-------------------|:--------------------|\n| Standalone | \"Handshake\" `U+1F91D`                   | only one supported |    not supported    |\n| Standalone | \"Woman with bunny ears\" `U+1F46F`       |   not supported    | only one supported  |\n| Standalone | \"Wrestlers\" `U+1F93C`                   |   not supported    | only one supported  |\n| Standalone | \"Man and woman holding hands\" `U+1F46B` | only one supported |    not supported    |\n| Standalone | \"Two men holding hands\" `U+1F46C`       | only one supported |    not supported    |\n| Standalone | \"Two women holding hands\" `U+1F46D`     | only one supported |    not supported    |\n| Standalone | \"Kiss\" `U+1F48F`                        | only one supported |    not supported    |\n| Standalone | \"Couple with heart\" `U+1F491`           | only one supported |    not supported    |\n| Standalone | \"Family\" `U+1F46A`                      |   not supported    |    not supported    |\n| Sequence   | \"Couple with heart\" ZWJ sequence        |     supported      |      supported      |\n| Sequence   | \"Couple in kiss\" ZWJ sequence           |     supported      |      supported      |\n| Sequence   | \"Couple holding hands\" ZWJ sequence     |     supported      |      supported      |\n| Sequence   | \"Family\" ZWJ sequence                   |     supported      |      supported      |\n| Sequence   | \"Shaking hands\" ZWJ sequence            |     required       |    not supported    |\n:::\n\n\n\n\n## Emoji sets ##\n\nUnicode defines several lists of emoji codepoints and emoji sequences\nthat constitute the sequences that are expected in general text.\n\nThe \"Basic emoji\" set includes all individual codepoints that can be\nrendered with the emoji presentation style (including those codepoints\nthat do not default to emoji presentation).\n\nThe \"Emoji keycap sequence\" set includes all possible valid Keycap\nsequences.\n\nThe \"<abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji modifier sequence\", \"<abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji flag sequence\", \"<abbr title=\"Recommended for General Interchange\">RGI</abbr>\nemoji tag sequence\", and \"<abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> sequence\" sets each include\nonly a subset of the possible valid sequences for their respective\nvariety of sequence. These sets are designated as \"Recommended for\nGeneral Interchange\" (<abbr>RGI</abbr>) to denote that they are in common usage.\n\nFinally, the \"<abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji set\" includes all of the codepoints and\nsequences included in the preceding sets. Presence in the <abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji\nset can be tracked with the `RGI_Emoji` property in the <abbr title=\"Unicode Character Database\">UCD</abbr>. Fonts are\nnot required to implement the entire <abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji set, nor any of the\nother sets.\n\n\n\n## The default shaping model ##\n\nEmoji should be shaped using the\n[default](opentype-shaping-default.md) shaping model. \n\nProcessing a run of text in the default shaping model involves three\ntop-level stages:\n\n1. Applying the basic substitution features from <abbr>GSUB</abbr>\n2. Applying other substitution features from <abbr>GSUB</abbr>\n3. Applying the positioning features from <abbr>GPOS</abbr>\n\nEmoji sequences as described above will generally be implemented in\nthe active font as a <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup feature. However, there are no\ndefinitively invalid <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features that must or must _not_ be\nemployed for this purpose.\n\nConsequently, shaping engines should not assume (for example) that\nemoji sequences will be implemented in any specific feature of <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nA font may also employ contextual features, such as using `locl`, that\naffects the emoji glyph shown, or use <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning for some emoji\nglyphs. \n\n\n### Font substitution for presentation forms ###\n\nBefore shaping begins, the rendering engine should analyze the text\nrun and identify presentation forms.\n\nA presentation sequence is used to request a specific presentation\nstyle (\"text\" or \"emoji\") for a codepoint, potentially overriding the\ndefault presentation style that is defined in Unicode for that\ncodepoint.\n\nBecause it is uncommon for a single font to include both an\nemoji-presentation-style glyph and a text-presentation-style glyph for\nthe same codepoint, handling a presentation sequence might\nrequire font substitution.\n\n> Note: Strictly speaking, font substitution is not part of the\n> shaping process, and the handling of missing presentation forms\n> might be most easily performed during segmentation of the text\n> stream into runs. However, shaping-engine implementers should be\n> aware that such presentation-sequence substitutions are allowable\n> and handle them gracefully.\n\n\n### 1. Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tlocl\n\tccmp\n\trlig\n\t\n\nAn emoji font can implement sequence support through any <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature\nlookup.\n\nBasic substitution features a common choice for emoji fonts and should\nbe applied at this stage. In particular, <abbr title=\"Glyph Substitution table\">GSUB</abbr> features that are\nenabled by default and <abbr title=\"Glyph Substitution table\">GSUB</abbr> features that cannot be disabled by\napplication-level user interfaces are common choices in which the\nactive font may implement emoji substitutions.\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\nIn other, non-emoji text runs, the `ccmp` feature allows a font to\nsubstitute mark-and-base sequences with a pre-composed glyph including\nthe mark and the base, or to substitute a single glyph into an\nequivalent decomposed sequence of glyphs. \n\nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs.\n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\nThe basic substitution features play a relatively more important role\nin shaping non-emoji text runs; therefore the shaping engine may\napply some of them (such as `locl`) them at an earlier stage in the\nshaping process. Emoji shaping should be unaffected by this decision.\n\n\n\n### 2. Applying typographic substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies all remaining substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThese substitutions include those features designed to provide\ntypographic consistency and correctness.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\nAn emoji font can implement sequence support through any <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature\nlookup. This can include any other substitution feature in the <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nfeature table.\n\nSupport for <abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji sequences or other emoji sequences defined as\nvalid in Unicode may be implemented in a feature that are enabled by\ndefault and cannot be disabled by application-level user interfaces,\nsuch as the `rlig` feature (for \"required ligatures\").\n\nHowever, emoji fonts may also include support for emoji sequences in\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> features that can be disabled by application-level user\ninterfaces, such as the `liga` feature (for standard ligatures). Emoji\nsequences may also be implemented in features that are disabled by\ndefault, such as the `dlig` feature (for \"discretionary ligatures\").\n\nAn emoji font might also implement support for emoji sequences through\nthe use of multiple features. For example, <abbr title=\"Recommended for General Interchange\">RGI</abbr> emoji sequences or\nother emoji sequences defined as valid in Unicode may be implemented\nin `rlig`, with custom sequences implemented in `liga`.\n\n\n\n### 3. Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs. In preparation for this stage, glyph sequences should be\ntagged for possible application of <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\nIn general, all emoji glyphs in a given font are expected to be\napproximately equal in height and width, and the usage of <abbr title=\"Glyph Positioning table\">GPOS</abbr>\npositioning for emoji is uncommon.\n\nHowever, some emoji glyphs might be narrower or wider than average by\nthe nature of the image itself (for example, certain national flags\nare narrower or wider than others), and there may be situations in\nwhich the active font alters the default position of an emoji glyph to\nachieve a consistent alignment, spacing, or appearance.\n\nTherefore, shaping engines should make no assumptions about the\npresence or absence of <abbr title=\"Glyph Positioning table\">GPOS</abbr> features for emoji runs, and should apply\nthe features if present.\n\n<!---\n\nFALLBACK ??\n\nmodifiers = skintone\n- fallback recommended to shown skintone patch, even though modifier\n  not normally visible codepoint??\n\nemoji sequences \n- Regional Indicators Symbols: \n  - might produce flags OR as a 'country code'\n  - limited to predefined list; has some deprecated code (eg, east germany)\n  - HB trickiness:\n    https://github.com/harfbuzz/harfbuzz/commit/2b0ced28b685de4edbd22cf5f59be30075984dfb\n\thttps://github.com/harfbuzz/harfbuzz/issues/2265\n\n--->\n"
  },
  {
    "path": "opentype-shaping-gujarati.md",
    "content": "```{include} /_global.md\n```\n\n# Gujarati shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Gujarati script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Gujarati character tables](#gujarati-character-tables)\n  - [The `<gjr2>` shaping model](#the-gjr2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<gujr>` shaping model](#the-gujr-shaping-model)\n      - [Distinctions from `<gjr2>`](#distinctions-from-gjr2)\n      - [Advice for handling fonts with `<gujr>` features only](#advice-for-handling-fonts-with-gujr-features-only)\n      - [Advice for handling text runs composed in `<gujr>` format](#advice-for-handling-text-runs-composed-in-gujr-format)\n\n\n## General information ##\n\nThe Gujarati script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the North Indic subgroup, in which\nsequences of adjacent consonants are often represented as conjuncts.\n\nThe Gujarati script is used to write multiple languages, most commonly\nGujarati, Kutchi, and Avestan. In addition, Sanskrit may be written\nin Gujarati, so Gujarati script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Gujarati script tags defined in OpenType, `<gujr>`\nand `<gjr2>`. The older script tag, `<gujr>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<gjr2>`\nshaping model. However, if a font is encountered that supports only\n`<gujr>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. \n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. \n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Gujarati text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nGujarati glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Gujarati character tables ###\n\nSeparate character tables are provided for the Gujarati and Vedic\nExtensions block as well as for other miscellaneous characters that\nare used in `<gjr2>` text runs:\n\n  - [Gujarati character table](character-tables/character-tables-gujarati.md#gujarati-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-gujarati.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-gujarati.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0A81`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A81; Candrabindu         |\n| | | | |\n|`U+0A95`   | Letter           | CONSONANT         | _null_                     | &#x0A95; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are designated as _unassigned_ in\nthe _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Gujarati text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>.\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\n\n\n\n## The `<gjr2>` shaping model ##\n\nProcessing a run of `<gjr2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Gujarati's specific shaping\ncharacteristics include: \n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_BEFORE_POST` = <samp>\"Reph\"</samp> is ordered before all post-base consonant forms.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms. \n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are\n     ordered after all post-base consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Gujarati consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Gujarati Unicode block enumerates nine modifier signs,\n> \"Candrabindu\" (`U+0A81`), \"Anusvara\" (`U+0A82`), \"Visarga\"\n> (`U+0A83`), \"Sukun\" (`U+0AFA`), \"Shadda\" (`U+0AFB`), \"Maddah\"\n> (`U+0AFC`), \"Three-Dot Nukta Above\" (`U+0AFD`), \"Circle Nukta Above\"\n> (`U+0AFE`), and \"Two-Circle Nukta Above\" (`U+0AFF`). In addition,\n> Sanskrit text written in Gujarati may include additional signs from\n> Vedic Extensions block. \n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that vowel is the\nsyllable's only vowel sound and, by definition, there is no \"base\"\nconsonant. \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant or syllable base. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by one of two combining\nmark-like forms. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable\n    is replaced with an above-base mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp>\n    is the only consonant in the syllable). \n    This rule is synonymous with the `REPH_MODE_IMPLICIT`\n    characteristic mentioned earlier.\n\n  - <samp>\"Halant,Ra\"</samp> sequences that occur elsewhere in the syllable may take on the\n    below-base form <samp>\"Rakaar\".</samp> \n\t\n<samp>\"Reph\"</samp> and <samp>\"Rakaar\"</samp> characters must be reordered after the\nsyllable-identification stage is complete. \n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Gujarati script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Gujarati syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. <samp>\"Ra\"</samp> may take on one of two special\n> forms, depending on its position in the syllable. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nGujarati. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\nVowel-based syllables, standalone-sequences, and broken text runs will\nnot have base consonants.\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Gujarati does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nGujarati includes one below-base consonant form.\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) and <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial\n    position) will take on the <samp>\"Rakaar\"</samp> form. \n\t\n> Note: Because Gujarati employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Stage 2, step 5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their components. Gujarati has one multi-part dependent vowel,\n\"Candra O\" (`U+0AC9`).\n\n> \"Candra O\" (`U+0AC9`) decomposes to \"`U+0AC5`,`U+0ABE`\"\n\n> Note: \"Candra O\" is categorized in Unicode as being a top-and-right\n> matra, a combination that would normally decompose into one\n> TOP_POSITION mark and one RIGHT_POSITION mark. In \"Candra O\",\n> however, the `U+0AC5` component is intended to be positioned over the\n> `U+0ABE` component, not above the base.\n>\n> Consequently, the two decomposed components should both be tagged\n> for the `POS_AFTER_POST` sorting position, and neither will need to\n> be reordered.\n>\n> In addition, the decomposition is not canonical in\n> Unicode. so performing the decomposition may trigger unknown\n> behavior from other components of the software stack. Consequently,\n> shaping engines may choose to skip it.\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/gujarati/gujarati-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} gujarati-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAbove-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_SUBJOINED`.\n\nRight-side dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.\n\nBelow-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nGujarati includes one below-base consonant form.\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) and <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial\n    position) will take on the <samp>\"Rakaar\"</samp> form. \n\t\n> Note: Because Gujarati employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in stage 2, step 1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the base consonant or syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<gjr2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!-- matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `rkrf` should match <samp>\"_Consonant_,Halant,Ra\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,Ra\"</samp> in post-base positions and\n           <samp>\"Ra,Halant\"</samp> in non-initial pre-base positions \n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `vatu` should match <samp>\"_Consonant_,Halant,Ra\"</samp> sequences\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf \n\tpref (not used in Gujarati)\n\tblwf \n\tabvf (not used in Gujarati)\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar (not used in Gujarati)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph.\n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n\n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n:::{figure-md}\n![nukt feature application](/images/gujarati/gujarati-nukt.svg \"nukt feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-nukt}\n\nnukt feature application\n:::\n\n```{svg-color-toggle-button} gujarati-nukt\n```\n\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![akhn KSsa formation](/images/gujarati/gujarati-akhn-kssa.svg \"akhn KSsa formation\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-akhn-kssa}\n\nakhn KSsa formation\n:::\n\n```{svg-color-toggle-button} gujarati-akhn-kssa\n```\n\n\n:::{figure-md}\n![akhn JNya formation](/images/gujarati/gujarati-akhn-jnya.svg \"akhn JNya formation\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-akhn-jnya}\n\nakhn JNya formation\n:::\n\n```{svg-color-toggle-button} gujarati-akhn-jnya\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n:::{figure-md}\n![Reph formation](/images/gujarati/gujarati-rphf.svg \"Reph formation\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-rphf}\n\nReph formation\n:::\n\n```{svg-color-toggle-button} gujarati-rphf\n```\n\n\t\n#### Stage 3, step 5: rkrf ####\n\nThe `rkrf` feature replaces <samp>\"_Consonant_,Halant,Ra\"</samp> sequences with the\n<samp>\"Rakaar\"</samp>-ligature form of the consonant glyph.\n\n  - The context defined for a `rkrf` feature is:\n\n:::{table} `rkrf` feature context\n \n| Backtrack           | Matching sequence     | Lookahead     |\n|:--------------------|:----------------------|:--------------|\n| `_consonant_`(full) | `_halant_`,\"Ra\"(full) | _none_        |\n:::\n\n\n:::{figure-md}\n![Rakaar ligation](/images/gujarati/gujarati-rkrf.svg \"Rakaar ligation\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-rkrf}\n\nRakaar ligation\n:::\n\n```{svg-color-toggle-button} gujarati-rkrf\n```\n\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Gujarati.\n\n<!--- 3.5: The `pref` feature replaces pre-base-consonant glyphs with -->\n<!--any special forms. --->\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Gujarati includes one below-base consonant\nform:\n\n  - <samp>\"Halant,Ra\"</samp> (occurring after the syllable base) or <samp>\"Ra,Halant\"</samp>\n    (before the syllable base, but in a non-syllable-initial position) will\n    take on the <samp>\"Rakaar\"</samp> form.\n\t\nIf the active font contains ligatures for the consonant adjacent to\nthe <samp>\"Halant\"</samp> (i.e., <samp>\"_Consonant_,Halant,Ra\"</samp>), then that ligature is\nnormally applied with the `rkrf` feature in stage 3, step 5. The `blwf`\nfeature allows the <samp>\"Ra\"</samp> to be substituted with a standalone <samp>\"Rakaar\"</samp>\nmark, to work with all consonants that do not have a `rkrf` ligature\nin the font.\n\nBecause Gujarati incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n:::{figure-md}\n![blwf feature application](/images/gujarati/gujarati-blwf.svg \"blwf feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-blwf}\n\nblwf feature application\n:::\n\n```{svg-color-toggle-button} gujarati-blwf\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Gujarati.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n<samp>\"_Consonant_\"</samp>.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are three exceptions to the default behavior, for which\nthe shaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - Non-initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged\n    for the `rkrf` or `blwf` features earlier, must not be flagged for\n    potential `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n:::{figure-md}\n![half-form feature application](/images/gujarati/gujarati-half.svg \"half-form feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-half}\n\nhalf-form feature application\n:::\n\n```{svg-color-toggle-button} gujarati-half\n```\n\n\n#### Stage 3, step 10: pstf ####\n\n> This feature is not used in Gujarati.\n\n\n#### Stage 3, step 11: vatu ####\n\nThe `vatu` feature replaces certain sequences with \"Vattu variant\"\nforms. \n\n\"Vattu variants\" are formed from glyphs followed by <samp>\"Rakaar\"</samp>\n(the below-base form of <samp>\"Ra\"</samp>); therefore, this feature must be applied after\nthe `blwf` feature.\n\n> Note: for Gujarati, the `vatu` feature performs the same set of\n> substitutions as the `rkrf` feature. The `rkrf` feature is\n> preferred; if a given font implements `rkrf`, it does not\n> necessarily have to implement `vatu`. Nevertheless, shaping engines\n> must support and process both features.\n\n:::{figure-md}\n![vatu feature application](/images/gujarati/gujarati-vatu.svg \"vatu feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-vatu}\n\nvatu feature application\n:::\n\n```{svg-color-toggle-button} gujarati-vatu\n```\n\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<gjr2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n:::{figure-md}\n![cjct feature application](/images/gujarati/gujarati-cjct.svg \"cjct feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-cjct}\n\ncjct feature application\n:::\n\n```{svg-color-toggle-button} gujarati-cjct\n```\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Gujarati.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra positioning](/images/gujarati/gujarati-matra-position.svg \"Pre-base matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-matra-position}\n\nPre-base matra positioning\n:::\n\n```{svg-color-toggle-button} gujarati-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Gujarati incorporates the `REPH_POS_BEFORE_POST`\nshaping characteristic, this final position is immediately before any\nindependent post-base consonant forms (meaning the first post-base\nconsonant that has not formed a ligature with the syllable base).\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n<!---\n\n  - Find the first explicit <samp>\"Halant\"</samp> between the first post-Reph\n    consonant and the last main consonant. Move the <samp>\"Reph to the\n    position immediately after this <samp>\"Halant\"</samp>.\n\t- If a zero-width joiner (<abbr>ZWJ</abbr>) or a zero-width non-joiner (<abbr>ZWNJ</abbr>)\n      follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n      immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n--->\n\n  - Starting at the first post-<samp>\"Reph\"</samp> consonant, search forward looking\n    for the first explicit <samp>\"Halant\"</samp>, ending the search when the base\n    consonant is encountered. If such an explicit <samp>\"Halant\"</samp> is found,\n    move the <samp>\"Reph\"</samp> to the position immediately after this\n    <samp>\"Halant\"</samp>.\n\t  * If a zero-width joiner (<abbr>ZWJ</abbr>) or a zero-width non-joiner (<abbr>ZWNJ</abbr>)\n        follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n        immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>. This will be the final\n        <samp>\"Reph\"</samp> position. \n\t  * If no <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> follows this <samp>\"Halant\"</samp>, leave the <samp>\"Reph\"</samp> in\n        its position immediately after the <samp>\"Halant\"</samp>. This will be the\n        final <samp>\"Reph\"</samp> position. \n  - If no such explicit <samp>\"Halant\"</samp> is found in the previous step, find\n    the first post-base consonant that has not formed a ligature with\n    the base consonant. If such a non-ligated post-base consonant is\n    found, move the <samp>\"Reph\"</samp> to the position immediately before the\n    non-ligated post-base consonant. This will be the final <samp>\"Reph\"</samp>\n    position.\n  - If no such non-ligated post-base consonant is found in the\n    previous step, move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Gujarati incorporates the\n    > `REPH_POS_BEFORE_POST` shaping characteristic, this means\n    > any positioning tag of `POS_POSTBASE_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_POSTBASE_CONSONANT`.\n  - If no other location has been located in the previous steps, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n\n\n\n:::{figure-md}\n![Reph positioning](/images/gujarati/gujarati-reph-position.svg \"Reph positioning\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-reph-position}\n\nReph positioning\n:::\n\n```{svg-color-toggle-button} gujarati-reph-position\n```\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nGujarati does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<gjr2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n  \n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nGujarati does not use the `init` feature, so this step will\ninvolve no work when processing `<gjr2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n   \n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Gujarati)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Gujarati.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![pres feature application](/images/gujarati/gujarati-pres.svg \"pres feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-pres}\n\npres feature application\n:::\n\n```{svg-color-toggle-button} gujarati-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![abvs feature application](/images/gujarati/gujarati-abvs.svg \"abvs feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-abvs}\n\nabvs feature application\n:::\n\n```{svg-color-toggle-button} gujarati-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or\nsyllable bases that\nare adjacent to the below-base-consonant form <samp>\"Rakaar\"</samp> with contextual\nligatures.\n\n:::{figure-md}\n![blws feature application](/images/gujarati/gujarati-blws.svg \"blws feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-blws}\n\nblws feature application\n:::\n\n```{svg-color-toggle-button} gujarati-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![psts feature application](/images/gujarati/gujarati-psts.svg \"psts feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-psts}\n\npsts feature application\n:::\n\n```{svg-color-toggle-button} gujarati-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![haln feature application](/images/gujarati/gujarati-haln.svg \"haln feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-haln}\n\nhaln feature application\n:::\n\n```{svg-color-toggle-button} gujarati-haln\n```\n\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Gujarati shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Gujarati text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Distance feature application](/images/gujarati/gujarati-dist.svg \"Distance feature application\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-dist}\n\nDistance feature application\n:::\n\n```{svg-color-toggle-button} gujarati-dist\n```\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Gujarati, this includes <samp>\"Reph\"</samp> in addition to\nabove-base dependent vowels (matras), diacritical marks, and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/gujarati/gujarati-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} gujarati-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Gujarati, this includes below-base dependent vowels\n(matras) and diacritical marks as well as the below-base consonant form <samp>\"Rakaar\"</samp>.\n\n:::{figure-md}\n![Below-base mark positioning](/images/gujarati/gujarati-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #gujarati-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} gujarati-blwm\n```\n\n\n\n## The `<gujr>` shaping model ##\n\nThe older Gujarati script tag, `<gujr>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<gujr>` and some users may still have documents that were written to\ntake advantage of `<gujr>` shaping.\n\n### Distinctions from `<gjr2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures) was altered when migrating from `<gujr>` to\n`<gjr2>`.\n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<gujr>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<gjr2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<gujr>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<gujr>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\n\n\n### Advice for handling fonts with `<gujr>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<gujr>` shaping model.\n\n### Advice for handling text runs composed in `<gujr>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<gujr>` script tag and it is known that the font in use supports\nonly the `<gjr2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<gujr>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n\n"
  },
  {
    "path": "opentype-shaping-gurmukhi.md",
    "content": "```{include} /_global.md\n```\n\n# Gurmukhi shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Gurmukhi script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Gurmukhi character tables](#gurmukhi-character-tables)\n  - [The `<gur2>` shaping model](#the-gur2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<guru>` shaping model](#the-guru-shaping-model)\n      - [Distinctions from `<gur2>`](#distinctions-from-gur2)\n      - [Advice for handling fonts with `<guru>` features only](#advice-for-handling-fonts-with-guru-features-only)\n      - [Advice for handling text runs composed in `<guru>` format](#advice-for-handling-text-runs-composed-in-guru-format)\n\n\n## General information ##\n\nThe Gurmukhi script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the North Indic subgroup, in which\nsequences of adjacent consonants are often represented as conjuncts.\n\nThe Gurmukhi script is used to write multiple languages, most commonly\nPunjabi, Sant Bhasha, and Sindhi. In addition, Sanskrit may be written\nin Gurmukhi, so Gurmukhi script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Gurmukhi script tags defined in OpenType, `<guru>`\nand `<gur2>`. The older script tag, `<guru>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<gur2>`\nshaping model. However, if a font is encountered that supports only\n`<guru>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign.\n\nThe term \"matra\" is also used to refer to the headline above most\nGurmukhi letters. To avoid ambiguity, the term **headline** is\nused in most Unicode and OpenType shaping documents.\n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Punjabi\nlanguage, this mark is known as the _adak bindi_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Gurmukhi text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nGurmukhi glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nGurmukhi uses one subclass of consonant, `CONSONANT_MEDIAL`.\n\n> Note: Unicode includes a second subclass of consonant,\n> `CONSONANT_PLACEHOLDER`, for two special vowel-carrier letters,\n> \"Iri\" (`U+0A72`) and \"Ura\" (`U+0A73`). For shaping purposes, however,\n> both <samp>\"Iri\"</samp> and <samp>\"Ura\"</samp> are classified as `CONSONANT`.\n\nThe `CONSONANT_MEDIAL` subclass is used for <samp>\"Yakash\"</samp> (`U+0A75`), a\nconsonant used in Sikh religious texts that is believed to be derived\nfrom the character <samp>\"Ya\"</samp> (`U+0A2F`). <samp>\"Yakash\"</samp> is positioned in a mark-like,\nbelow-base form, but it must pass tests for consonants when\nidentifying syllables.\n\nGurmukhi differs from many other Indic scripts in that independent\nvowels are represented by the standard dependent-vowel marks (matras)\nattached to a special vowel-carrier character. However, because each\nindependent vowel has been assigned its own codepoint by Unicode, the\nstandard `VOWEL_INDEPENDENT` and `VOWEL_DEPENDENT` classifications\nfunction normally.\n\nThe vowel carrier <samp>\"Aira\"</samp>, with no dependent-vowel mark, represents the\nindependent form of the inherent vowel, \"A\" (`U+0A05`).  In a sense,\nthis character serves a double function. \n\nThe other two vowel carriers, <samp>\"Iri\"</samp> (`U+0A72`) and <samp>\"Ura\"</samp> (`U+0A73`)\ndo not normally occur on their own in Gurmukhi syllables, but they may\nappear as standalone entities, much like marks and other symbols do\nwhen they are referenced or displayed as examples. To support this use\ncase, the <samp>\"Iri\"</samp> and <samp>\"Ura\"</samp> characters have the status of consonants for\nshaping purposes. \n\n<!--- Both subclasses should match tests for consonants, such as when [identifying\nsyllables](#1-identifying-syllables-and-other-sequences), but may\nrequire special treatment in other circumstances. --->\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Gurmukhi character tables ###\n\nSeparate character tables are provided for the Gurmukhi and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<gur2>` text runs:\n\n  - [Gurmukhi character table](character-tables/character-tables-gurmukhi.md#gurmukhi-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-gurmukhi.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-gurmukhi.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0A01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0A01; Adak Bindi          |\n| | | | |\n|`U+0A15`   | Letter           | CONSONANT         | _null_                     | &#x0A15; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Gurmukhi text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\n> Note: <samp>\"Reph\"</samp> substitutions are rare in Gurmukhi text. `<gur2>` fonts may\n> not implement the <samp>\"Reph\"</samp> substitution in <abbr title=\"Glyph Substitution table\">GSUB</abbr> at all. Nevertheless,\n> shaping engines must test for it in order to provide the\n> functionality if it is implemented.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of Gurmukhi text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<gur2>` shaping model ##\n\nProcessing a run of `<gur2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Gurmukhi's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_BEFORE_SUBJOINED` = <samp>\"Reph\"</samp> is ordered before all subjoined (i.e.,\n     below-base) consonant forms.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_POST` = above-base matras are\n     ordered after all post-base consonant forms.\n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are\n     ordered after all post-base consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Gurmukhi consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Gurmukhi Unicode block enumerates six modifier signs,\n> \"Adak Bindi\" (`U+0A01`), \"Bindi\" (`U+0A02`), \"Visarga\" \n> (`U+0A03`), \"Udaat\" (`U+0A51`), \"Tippi\" (`U+0A70`), and \"Addak\"\n> (`U+0A71`). In addition, Sanskrit text written in Gurmukhi may\n> include additional signs from Vedic Extensions block.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant or syllable base. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant or syllable base (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nGurmukhi also includes special consonants that can occur after the\nbase consonant or syllable base. These post-base consonants will also be separated from\nthe base consonant or syllable base by a <samp>\"Halant\"</samp> mark; the algorithm for correctly\nidentifying the base consonant includes a test to recognize these sequences\nand not mis-identify the base consonant.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by one of two combining\nmark-like forms. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable may be replaced\n    with an above-base mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp> is the only\n    consonant in the syllable). This rule is synonymous with the\n    `REPH_MODE_IMPLICIT` characteristic mentioned earlier.\n\n  - A <samp>\"Ra,Halant\"</samp> sequence before the base consonant or syllable base or a <samp>\"Halant,Ra\"</samp>\n    sequence after the base consonant or syllable base may be replaced with a\n    below-base mark.\n  \n> Note: <samp>\"Reph\"</samp> substitutions are rare in Gurmukhi text. `<gur2>` fonts may\n> not implement the <samp>\"Reph\"</samp> substitution in <abbr title=\"Glyph Substitution table\">GSUB</abbr> at all. Nevertheless,\n> shaping engines must test for it in order to provide the\n> functionality if it is implemented.\n\n<samp>\"Reph\"</samp> characters must be reordered after the syllable-identification\nstage is complete.\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Gurmukhi script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Gurmukhi syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. This includes the below-base forms\n> of <samp>\"Ra\"</samp>, <samp>\"Ha\"</samp>, and <samp>\"Va\"</samp>.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nGurmukhi. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Gurmukhi does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nGurmukhi includes one post-base form:\n\n  - <samp>\"Halant,Ya\"</samp> takes on a post-base form.\n  \n:::{figure-md}\n![Post-base consonants](/images/gurmukhi/gurmukhi-pstf.svg \"Post-base consonants\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-pstf}\n\nPost-base consonants\n:::\n\n```{svg-color-toggle-button} gurmukhi-pstf\n```\n\n\nGurmukhi includes three below-base consonant forms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the base consonant or syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form.\n  - <samp>\"Halant,Ha\"</samp> (after the base consonant or syllable base) and <samp>\"Ha,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n  - <samp>\"Halant,Va\"</samp> (after the base consonant or syllable base) and <samp>\"Va,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n\nGurmukhi also includes the CONSONANT_MEDIAL subclass, used only for <samp>\"Yakash\"</samp>\n(U+0A75), which is rendered as a below-base form. <samp>\"Yakash\"</samp> should\nbe tagged as `POS_BELOWBASE_CONSONANT`.\n\n> Note: Because Gurmukhi employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Stage 2, step 5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their left-side and right-side components. Gurmukhi has no\ntwo-part dependent vowels, so this step will involve no work when\nprocessing `<gur2>` text. It is included here in order to maintain\ncompatibility with the other Indic scripts.\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll above-base dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nAll right-side dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nAll below-base dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nGurmukhi includes three below-base consonant forms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the base consonant or syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form.\n  - <samp>\"Halant,Ha\"</samp> (after the base consonant or syllable base) and <samp>\"Ha,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n  - <samp>\"Halant,Va\"</samp> (after the base consonant or syllable base) and <samp>\"Va,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n\n> Note: Because Gurmukhi employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in 2.1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n> Note: <samp>\"Reph\"</samp> substitutions are rare in Gurmukhi text. `<gur2>` fonts may\n> not implement the <samp>\"Reph\"</samp> substitution in <abbr title=\"Glyph Substitution table\">GSUB</abbr> at all. Nevertheless,\n> shaping engines must test for it in order to provide the\n> functionality if it is implemented.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<gur2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,Ra\"</samp>, <samp>\"Halant,Ha\"</samp>, and <samp>\"Halant,Va\"</samp> in\n            post-base positions and <samp>\"Ra,Halant\"</samp>, <samp>\"Ha,Halant\"</samp>, and\n            <samp>\"Va,Halant\"</samp> in non-initial pre-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match initial <samp>\"Halant,Ya\"</samp> in post-base position\n  - `vatu` should match <samp>\"_Consonant_,Halant,Ra\"</samp>,\n           <samp>\"_Consonant_,Halant,Ha\"</samp>, and <samp>\"_Consonant_,Halant,Va\"</samp>\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Gurmukhi)\n\tpref (not used in Gurmukhi)\n\tblwf \n\tabvf (not used in Gurmukhi)\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar (not used in Gurmukhi)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta composition](/images/gurmukhi/gurmukhi-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-nukt\n```\n\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n> Note: Akhand ligatures are rare in Gurmukhi text. Nevertheless,\n> shaping engines must test for the feature in order to provide the\n> functionality if it is implemented.\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n> Note: <samp>\"Reph\"</samp> usage is rare in Gurmukhi text. Nevertheless,\n> shaping engines must test for the feature in order to provide the\n> functionality if it is implemented.\n\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Gurmukhi.\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Gurmukhi.\n\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Gurmukhi includes three below-base consonant\nforms:\n\n  - <samp>\"Halant,Ra\"</samp> (after the base consonant or syllable base) and <samp>\"Ra,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form.\n  - <samp>\"Halant,Ha\"</samp> (after the base consonant or syllable base) and <samp>\"Ha,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n  - <samp>\"Halant,Va\"</samp> (after the base consonant or syllable base) and <samp>\"Va,Halant\"</samp> (in a\n    non-syllable-initial position) take on a below-base form. \n\nBecause Gurmukhi incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n\n\n:::{figure-md}\n![Below-base Ra composition](/images/gurmukhi/gurmukhi-blwf-ra.svg \"Below-base Ra composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-ra}\n\nBelow-base Ra composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-blwf-ra\n```\n\n\n:::{figure-md}\n![Below-base Va composition](/images/gurmukhi/gurmukhi-blwf-va.svg \"Below-base Va composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-va}\n\nBelow-base Va composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-blwf-va\n```\n\n\n:::{figure-md}\n![Below-base Ha composition](/images/gurmukhi/gurmukhi-blwf-ha.svg \"Below-base Ha composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-ha}\n\nBelow-base Ha composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-blwf-ha\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Gurmukhi.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n<samp>\"_Consonant_\"</samp>.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are three exceptions to the default behavior, for which\nthe shaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - Non-initial <samp>\"Ra,Halant\"</samp>, <samp>\"Ha,Halant\"</samp>, and <samp>\"Va,Halant\"</samp> sequences,\n    which should have been flagged for the `rkrf` or `blwf` features\n    earlier, must not be flagged for potential `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n> Note: Half forms are rare in Gurmukhi text. Fonts supporting\n> `<gur2>` may implement the `half` feature using explicit <samp>\"Halant\"</samp>\n> glyphs, as illustrated here.\n\n:::{figure-md}\n![Half-form composition](/images/gurmukhi/gurmukhi-half.svg \"Half-form composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-half}\n\nHalf-form composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-half\n```\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any special forms.\n\nGurmukhi includes one post-base form:\n\n  - <samp>\"Halant,Ya\"</samp> takes on a post-base form.\n\n:::{figure-md}\n![Post-base Ya composition](/images/gurmukhi/gurmukhi-pstf-1.svg \"Post-base Ya composition\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-pstf-1}\n\nPost-base Ya composition\n:::\n\n```{svg-color-toggle-button} gurmukhi-pstf-1\n```\n\n\n#### Stage 3, step 11: vatu ####\n\nThe `vatu` feature replaces certain sequences with \"Vattu variant\"\nforms. \n\n\"Vattu variants\" are formed from glyphs followed by the below-base\nform of <samp>\"Ra\"</samp>, <samp>\"Ha\"</samp>, or <samp>\"Va\"</samp>; therefore, this feature must be applied after\nthe `blwf` feature.\n\n> Note: vattu variants are rare in Gurmukhi text. Nevertheless,\n> shaping engines must test for the feature in order to provide the\n> functionality if it is implemented.\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<gur2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n> Note: Conjunct forms are rare in Gurmukhi text. Nevertheless,\n> shaping engines must test for the feature in order to provide the\n> functionality if it is implemented.\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Gurmukhi.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the --->\n<!--- character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra positioning](/images/gurmukhi/gurmukhi-matra-position.svg \"Pre-base matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-matra-position}\n\nPre-base matra positioning\n:::\n\n```{svg-color-toggle-button} gurmukhi-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Gurmukhi incorporates the `REPH_POS_BEFORE_SUBJOINED`\nshaping characteristic, this final position is defined to be\nimmediately after the syllable base and before any subjoined\n(below-base consonant or below-base dependent vowel) forms.\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Starting at the first post-<samp>\"Reph\"</samp> consonant, search forward looking\n    for the first explicit <samp>\"Halant\"</samp>, ending the search when the base\n    consonant is encountered. If such an explicit <samp>\"Halant\"</samp> is found,\n    move the <samp>\"Reph\"</samp> to the position immediately after this\n    <samp>\"Halant\"</samp>.\n\t  * If a zero-width joiner (<abbr>ZWJ</abbr>) or a zero-width non-joiner (<abbr>ZWNJ</abbr>)\n        follows this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to the position\n        immediately after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>. This will be the final\n        <samp>\"Reph\"</samp> position. \n\t  * If no <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> follows this <samp>\"Halant\"</samp>, leave the <samp>\"Reph\"</samp> in\n        its position immediately after the <samp>\"Halant\"</samp>. This will be the\n        final <samp>\"Reph\"</samp> position. \n  - If no such explicit <samp>\"Halant\"</samp> is found in the previous step, find\n    the first post-base consonant that has not formed a ligature with\n    the base consonant. If such a non-ligated post-base consonant is\n    found, move the <samp>\"Reph\"</samp> to the position immediately before the\n    non-ligated post-base consonant. This will be the final <samp>\"Reph\"</samp>\n    position.\n  - If no such non-ligated post-base consonant is found in the\n    previous step, move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Gurmukhi incorporates the\n    > `REPH_POS_BEFORE_SUBJOINED` shaping characteristic, this means\n    > any positioning tag of `POS_BELOWBASE_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_BELOWBASE_CONSONANT`.\n  - If no other location has been located in the previous steps, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n\n<!---\n  - If the syllable does not have a base consonant (such as a syllable\n    based on an independent vowel), then the final <samp>\"Reph\"</samp> position is\n    immediately before the first character tagged with the\n    `POS_BEFORE_POST` position or any later position in the sort\n    order.\n\n    -- If there are no characters tagged with `POS_BEFORE_POST` or\n       later positions, then <samp>\"Reph\"</samp> is positioned at the end of the\n       syllable.\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n--->\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nGurmukhi does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<gur2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nGurmukhi does not use the `init` feature, so this step will\ninvolve no work when processing `<gur2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature replaces word-initial glyphs with special\npresentation forms. Generally, these forms involve removing the\nheadline in-stroke from the left side of the glyph.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base substitutions](/images/gurmukhi/gurmukhi-abvs.svg \"Above-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-abvs}\n\nAbove-base substitutions\n:::\n\n```{svg-color-toggle-button} gurmukhi-abvs\n```\n\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonant or syllable bases that\nare followed by below-base-consonant forms (like those of <samp>\"Ra\"</samp>, <samp>\"Ha\"</samp>,\n<samp>\"Va\"</samp>, or <samp>\"Yakash\"</samp>) with contextual ligatures.\n\n:::{figure-md}\n![Below-base substitutions](/images/gurmukhi/gurmukhi-blws.svg \"Below-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blws}\n\nBelow-base substitutions\n:::\n\n```{svg-color-toggle-button} gurmukhi-blws\n```\n\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures.\n\nThe `haln` feature replaces word-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![Halant form substitutions](/images/gurmukhi/gurmukhi-haln.svg \"Halant form substitutions\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-haln}\n\nHalant form substitutions\n:::\n\n```{svg-color-toggle-button} gurmukhi-haln\n```\n\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Gurmukhi shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Gurmukhi text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Application of the dist feature](/images/gurmukhi/gurmukhi-dist.svg \"Application of the dist feature\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-dist}\n\nApplication of the dist feature\n:::\n\n```{svg-color-toggle-button} gurmukhi-dist\n```\n\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Gurmukhi, this includes <samp>\"Reph\"</samp> in addition to the\ndiacritical marks and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/gurmukhi/gurmukhi-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} gurmukhi-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Gurmukhi, this includes below-base dependent vowels\n(matras) as well as the below-base consonant forms of <samp>\"Ra\"</samp>, <samp>\"Ha\"</samp>, and\n<samp>\"Va\"</samp>.\n\n:::{figure-md}\n![Below-base mark positioning](/images/gurmukhi/gurmukhi-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} gurmukhi-blwm\n```\n\n\n## The `<guru>` shaping model ##\n\nThe older Gurmukhi script tag, `<guru>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<guru>` and some users may still have documents that were written to\ntake advantage of `<guru>` shaping.\n\n### Distinctions from `<gur2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<guru>` to\n`<gur2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<guru>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<gur2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<guru>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<guru>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\n\n### Advice for handling fonts with `<guru>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<guru>` shaping model.\n\n### Advice for handling text runs composed in `<guru>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<guru>` script tag and it is known that the font in use supports\nonly the `<gur2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<guru>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n\n"
  },
  {
    "path": "opentype-shaping-hangul.md",
    "content": "```{include} /_global.md\n```\n\n# Hangul script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nHangul script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Jamo type](#jamo-type)\n\t  - [Composing behavior](#composing-behavior)\n\t  - [Character tables](#character-tables)\n  - [The `<hang>` shaping model](#the-hang-shaping-model)\n      - [Stage 1: Identifying syllables](#stage-1-identifying-syllables)\n      - [Stage 2: Determining if the syllable can be composed into a Hangul Syllables codepoint](#stage-2-determining-if-the-syllable-can-be-composed-into-a-hangul-syllables-codepoint)\n      - [Stage 3: Composing the syllable (if composition is possible)](#stage-3-composing-the-syllable-if-composition-is-possible)\n      - [Stage 4: Fully decomposing the syllable (if composition is not possible)](#stage-4-fully-decomposing-the-syllable-if-composition-is-not-possible)\n      - [Stage 5: Shaping the fully decomposed syllable with <abbr>GSUB</abbr> features](#stage-5-shaping-the-fully-decomposed-syllable-with-gsub-features)\n      - [Stage 6: Reordering tone marks](#stage-6-reordering-tone-marks)\n \n\n\n\n## General information ##\n\nThe Hangul script is used to write Korean. It may also be referred to\nas the Choseongul script or Jeongum script, and is in use in both\nNorth Korea and South Korea as well as regions within China. It may\nalso be used to write the Cia-Cia language in Indonesia.\n\nHangul syllables are formed from individual alphabetic letters that\nare arranged into square cells using pre-defined patterns. The\nsyllables themselves are monospaced in a run of text, using interword\nspacing and punctuation.\n\nKorean text may, in practice, incorporate Chinese characters (\"Hanja\")\nin addition to Hangul. Hanja characters are not affected by the\nshaping model for Hangul.\n\nModern Korean text is typically written (and, therefore, rendered)\nleft to right. Classical and older texts, however, may be written\nvertically, top to bottom.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nHangul script. The terms used colloquially in any particular language\nor country may vary, however, potentially causing confusion.\n\n**Jamo** characters are the fundamental letters from which syllable\nblocks are constructed.  There are three classes of jamo:\n\n  - **L**eading consonants (choseong)\n  - **V**owels (jungseong)\n  - **T**railing consonants (jongseong)\n\nMost, but not all, of the basic consonant letters can appear either in\nleading or in trailing form. Nevertheless, the leading and trailing forms are\nassigned distinct codepoints in Unicode. In addition, the set of valid\ntrailing consonants includes several compound-consonant pairs that can\nnever occur in leading form.\n\nOld Korean featured a considerable number of additional jamo, which\nare also defined in Unicode.  Many of these Old Korean jamo are\ncompound forms that concatenate two or three basic jamo. \n  \nA **syllable** is formed by arranging a sequence of jamo into its\nappropriate square-cell form. The horizontal and vertical positioning\nof each jamo in the cell depends on the content of the syllable. The\nexact shape and proportions of each jamo will also vary with its final\nposition in the cell. \n\nValid syllables must be either of the form \"**`L`**,**`V`**\" or of the form\n\"**`L`**,**`V`**,**`T`**\". That is, each syllable must begin with one leading\nconsonant, must include one vowel in the second position, and may or may\nnot end with one trailing consonant. \n\n:::{figure-md}\n![LV syllable](images/hangul/hangul-lv-syllable.svg \"LVT syllable\"){.shaping-demo .inline-svg .greyscale-svg #hangul-lv-syllable}\n\nLV syllable\n:::\n\n```{svg-color-toggle-button} hangul-lv-syllable\n```\n\n\n:::{figure-md}\n![LVT syllable](images/hangul/hangul-lvt-syllable.svg \"LVT syllable\"){.shaping-demo .inline-svg .greyscale-svg #hangul-lvt-syllable}\n\nLVT syllable\n:::\n\n```{svg-color-toggle-button} hangul-lvt-syllable\n```\n\n\nAll possible syllables for Modern Korean are defined in the Hangul\nSyllables block of Unicode. A sequence of individual jamo codepoints\nthat corresponds to a valid Modern Korean syllable can therefore be\n**composed** into a syllable codepoint. \n\nSequences of codepoints that involve Old Korean jamo cannot be\ncomposed into syllable codepoints and are handled separately by the\nshaping engine.\n\nTwo tone marks are common in Old Korean, the **single-dot bangjeom**\nand the **double-dot bangjeom**. Both bangjeom marks are rendered to\nthe left of the syllable to which they are applied.\n\n\n\n## Glyph classification ##\n\nProper shaping of Hangul text runs involves determining when\nsequences of jamo can be composed into syllable codepoints that are included in\nthe active font — in which case they should be replaced by the corresponding\nsyllable glyph — and when they cannot. \n\nThose jamo sequences that cannot be composed into a syllable codepoint\n(or that compose into a syllable codepoint that is missing in the\nactive font) are then rendered by shaping and positioning each\nindividual jamo using <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution rules. \n\n\n\n### Jamo type ###\n\nEach Hangul jamo is assigned a `JAMO_TYPE` property that indicates whether\nit is a leading consonant (`L`), a vowel (`V`), or a trailing\nconsonant (`T`).\n\nMost, but not all, of the basic consonant letters can appear either in\nleading or in trailing form. Nevertheless, the leading and trailing\nforms are assigned distinct codepoints in Unicode. In addition, the\nset of valid trailing consonants includes several compound consonant\npairs that can never occur in leading form.\n\nFor example, the basic consonant \"Kiyeok\" (&#x1100;) is encoded as `U+1100`\nin its leading (choseong) form but as `U+11A8` in its trailing\n(jongseong) form. The tense or emphatic form of the consonant,\n\"Ssangkiyeok\" (&#x1101;), is encoded in its leading (choseong) form as\n`U+1101` but in its trailing (jongseong) form as `U+11A9`, and is\nrendered visually as a doubled version of the basic consonant.\n\nIn addition, two compound trailing consonants, \"Kiyeok-sios\" (&#x11aa;\n`U+11AA`) and \"Rieul-kiyeok\" (&#x11b0; `U+11B0`), also incorporate the\nKiyeok basic consonant. But Kiyeok-sios and Rieul-kiyeok are never\nused as leading consonants, therefore they are not encoded in leading\n(choseong) forms.\n\n> Note: compound consonant jamo are not written as sequences of basic\n> jamo. That is, <samp>\"Kiyeok,Kiyeok\"</samp> (&#x1100;&#x1100;) is not equivalent\n> to <samp>\"Ssangkiyeok\"</samp> (&#x1101;). \n\nThe Hangul Jamo block also includes two \"filler\" codepoints. \"Choseong\nFiller\" (`U+115F`) can take the place of a missing choseong (`L`\nconsonant), and \"Jungseong Filler\" (`U+1160`) can take the place of a\nmissing jungseong (`V` vowel). For shaping purposes, the fillers are\nclassified as type `Lf` and type `Vf`, respectively.\n\n\n### Composing behavior ###\n\nModern Korean features 19 leading consonants (`L` forms), 21 vowels\n(`V` forms), and 27 trailing consonants (`T` forms). \n\nOld Korean featured a considerable number of additional jamo, which\nare also defined in Unicode.  Some of these Old Korean jamo are\ndistinct basic letters that are no longer used in Modern Korean. Many\nothers are compound forms that concatenate two or even three basic jamo. \n\nThe Hangul Syllables block in Unicode only includes those syllables\nthat contain solely Modern jamo. Consequently, each jamo is assigned a\n`COMPOSING_BEHAVIOR` property to indicate whether it can be composed\ninto a Hangul Syllable codepoint. \n\nAn <samp>\"`L`,`V`,`T`\"</samp> sequence with the `COMPOSING_BEHAVIOR`s\n\"`YES`,`YES`,`YES`\" or an <samp>\"`L`,`V`\"</samp> sequence with the\n`COMPOSING_BEHAVIOR`s \"`YES`,`YES`\" will compose to a codepoint in the Hangul\nSyllables block. A sequence containing any `NO`s will not compose to a\ncodepoint in the Hangul Syllables block.\n\n> Note: the jamo filler codepoints are both designated with the\n> `COMPOSING_BEHAVIOR` of `NO`.\n\n\n<!--- ### Identification by Unicode range ### --->\n\n\n### Character tables ###\n\n\nSeparate character tables are provided for the Hangul Jamo, Hangul\nJamo Extended-A, and Hangul Jamo Extended-B blocks, as well as for other miscellaneous\ncharacters that are used in `<hang>` text runs:\n\n  - [Hangul Jamo character table](character-tables/character-tables-hangul.md#hangul-jamo-character-table)\n  - [Hangul Jamo Extended-A character table](character-tables/character-tables-hangul.md#hangul-jamo-extended-a-character-table)\n  - [Hangul Jamo Extended-B character table](character-tables/character-tables-hangul.md#hangul-jamo-extended-b-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-hangul.md#miscellaneous-character-table)\n\n\nThe Hangul Jamo block contains all of the Modern Korean jamo, the two\njamo fillers, and the most common Old Korean jamo. \n\nThe Hangul Jamo Extended-A block contains additional `L` (choseong)\njamo for Old Korean. The Hangul Jamo Extended-B block contains\nadditional `V` (jungseong) and `T` (jongseong) jamo for Old Korean.\n\nThe Hangul Syllables block contains all of the valid permutations of the\nModern Korean jamo. Each syllable codepoint can be classified by\nsyllable type, either `LV` or `LVT`. These types are synonymous with\nthe \"Hangul Syllable Type\" property in Unicode. Due to the size of the\nHangul Syllables block, a full character table is not\nprovided. However, a\n[summary](character-tables/character-tables-hangul.md#hangul-syllables-character-table)\nis included to show the ranges of `LV` and `LVT` syllables.\n\nUnicode also defines a Hangul Compatibility Jamo block that implements\nbackward compatibility with a retired file-encoding format. Unless a\nsoftware application is required to support specific stores of\ndocuments that are known to have used the older encoding, however, the \nshaping engine should not be expected to handle any text runs\nincorporating codepoints from this block.\n\nThe tables list each codepoint along with its Unicode general\ncategory, its jamo type, and its composing behavior. The codepoint's\nUnicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Jamo type | Composing | Glyph                            |\n|:----------|:-----------------|:----------|:----------|:---------------------------------|\n|`U+1109`   | Letter           | L         | YES       | &#x1109; Sios                    |\n| | | | | |\n|`U+1182`   | Letter           | V         | NO        | &#x1182; O-O                     |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nIn addition to general punctuation, runs of Hangul text may use\npunctuation marks from the CJK Symbols And Punctuation block. \n\nOf particular note are the single-dot tone mark (single-dot bangjeom)\nand double-dot tone mark (double-dot bangjeom), `U+302E` and\n`U+302F`. These non-spacing marks are common in Old Korean.\n\nOther important characters that may be encountered when shaping runs\nof Hangul text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).\n\nThe dotted-circle placeholder is frequently used when displaying a\nmark in isolation. Real-world text may also use other characters, such\nas hyphens or dashes, in a similar placeholder fashion; shaping\nengines should cope with this situation gracefully.\n\nThe zero-width space (`U+200B`) or word joiner (`U+2060`) may be used\nbetween two jamo to prevent them from being conjoined into a\nsyllable. The zero-width space allows a line break to happen between\nthe jamo, while the word joiner prevents the jamo from being separated\nby a line break.\n\n\n## The `<hang>` shaping model ##\n\nProcessing a run of `<hang>` text involves six top-level stages:\n\n1. Identifying syllables\n2. Determining if the syllable can be composed into a Hangul Syllables codepoint\n3. Composing the syllable (if composition is possible)\n4. Fully decomposing the syllable (if composition is not possible)\n5. Shaping the fully decomposed syllable with <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\n6. Reordering tone marks\n\n\n### Stage 1: Identifying syllables ###\n\nThe precomposed syllable codepoints in the Hangul Syllable block come in\ntwo forms: `LV` syllables (which represent an `L` jamo and a `V` jamo)\nand `LVT` syllables (which represent an `L` jamo, a `V` jamo, and a `T` jamo).\n\nA syllable consisting of a string of jamo must match either the\nsequence <samp>\"`L`,`V`\"</samp> or the sequence <samp>\"`L`,`V`,`T`\"</samp>.\n\nThe `L`, `V`, and `T` components must be a single jamo each. In Modern\nKorean, all of the jamo must have a `COMPOSING_BEHAVIOR` of `YES`. In\nOld Korean, `YES` and `NO` are both acceptable for\n`COMPOSING_BEHAVIOR`. \n\nHowever, real-world input can also include syllables entered as a\nprecomposed `LV` Hangul Syllable codepoint followed by a standalone\n`T` jamo.\n\nSyllables in a text run can therefore be identified with the following\nregular expression:\n\n```\n    Slvt |  <Slv | <L|Lf>+<V|Vf>> + [T]\n```\n\n\n    L\t  L jamo\n\tV\t  V jamo\n\tT\t  T jamo\n\tLf\t  L jamo filler\n\tVf\t  V jamo filler\n\tSlv\t  Precomposed \"LV\" syllable\n\tSlvt\t  Precomposed \"LVT\" syllable\n\t\n\t[ ]\t  optional occurence of the enclosed expression\n\t<|>\t  one of the options separated by the vertical bar\n\n\nThe expression matches five possible syllable types:\n\n  - `Slvt`\n  - `Slv`\n  - `Slv`,`T`\n  - `L`,`V`\n  - `L`,`V`,`T`\n\nSequences of jamo that do not match the above expression should be\ntreated as runs of standalone jamo letters.\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n\n### Stage 2: Determining if the syllable can be composed into a Hangul Syllables codepoint ###\n\n\n#### Stage 2, step 1: Fully precomposed syllables ####\n\nA precomposed `Slvt` or `Slv` syllable requires no shaping if the active\nfont includes a glyph for the corresponding Hangul Syllables\ncodepoint. If the glyph is present, the shaping engine can render it\nand proceed directly to stage six without further work. If the glyph\nis not present, the shaping engine must proceed to stage four.\n\nThe other syllable types involve jamo, and each syllable must be\nexamined to determine if it composes into a codepoint in the Hangul\nSyllables block.\n\n\n#### Stage 2, step 2: Partially precomposed syllables ####\n\nFor <samp>\"`Slv`,`T`\"</samp> syllables, the `Slv` codepoint must first be\ndecomposed into its constituent jamo. Then, the resulting\n<samp>\"`L`,`V`,`T`\"</samp> syllable must be examined in the [next\nstep](#stage-2-step-3-fully-jamo-syllables). \n\nThe decomposition of the `Slv` syllable is canonical, and uses the\nalgorithm defined in [stage four](#stage-4-fully-decomposing-the-syllable-if-composition-is-not-possible).\n\n\n#### Stage 2, step 3: Fully jamo syllables ####\n\nFor <samp>\"`L`,`V`\"</samp> and <samp>\"`L`,`V`,`T`\"</samp> syllables, the `COMPOSING_BEHAVIOR` of\neach jamo must be examined. \n\nIf all jamo in the syllable have `COMPOSING_BEHAVIOR` of `YES`, then\nthe shaping engine should proceed to stage three and attempt to\ncompose the jamo into the corresponding Hangul Syllables codepoint.\n\nIf any of the jamo in the syllable have `COMPOSING_BEHAVIOR` of `NO`,\nthen the shaping engine should proceed to stage five and shape the\nsyllable using <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\n\n### Stage 3: Composing the syllable (if composition is possible) ###\n\nUnicode defines a canonical algorithm for composing jamo into Hangul\nSyllables codepoints. The algorithm leverages the strict jamo-ordering\nof the syllables in the block to provide an algebraic method to\ndetermine the codepoint of a syllable using the codepoints of its\nconstituent `L`, `V`, and (if needed) `T` jamo as input.\n\nThe algorithm defines the following consonants:\n\n```\n\tSBase = AC00\n\tLBase = 1100\n\tVBase = 1161\n\tTBase = 11A7\n\tLCount = 19\n\tVCount = 21\n\tTCount = 28\n\tNCount = (VCount * TCount) = 588\n\tSCount = (LCount * NCount) = 11172\n```\n\t\nFor a jamo sequence <samp>\"`L`,`V`\"</samp>, where both `L` and `V` are of\n`COMPOSING_BEHAVIOR` `YES`, the composed syllable codepoint is found\nby computing:\n\n```\n\tLIndex = L - LBase\n\tVIndex = V - VBase\n\tLVIndex = LIndex * NCount + VIndex * TCount\n\tSlv = SBase + LVIndex\n```\n\nSimilarly, for a jamo sequence <samp>\"`L`,`V`,`T`\"</samp>, where `L`, `V`, and `T`\nare all of `COMPOSING_BEHAVIOR` `YES`, the composed syllable codepoint\nis found by computing:\n\n```\n\tLIndex = L - LBase\n\tVIndex = V - VBase\n\tTIndex = T - TBase\n\tLVIndex = LIndex * NCount + VIndex * TCount\n\tSlvt = SBase + LVIndex + TIndex\n```\n\nAfter the syllable codepoint has been found, the shaping engine must\nverify that the codepoint's glyph exists in the active font. If the\nglyph is present, the shaping engine must substitute the input jamo\nsequence with the glyph. The shaping engine can then proceed to stage\nsix. \n\nIf the needed codepoint is missing, the shaping engine should perform\nno substitution and must proceed to stage five with the original `L`,\n`V`, and (if used) `T` jamo. \n\n:::{figure-md}\n![Syllable composition](images/hangul/hangul-compose.svg \"Syllable composition\"){.shaping-demo .inline-svg .greyscale-svg #hangul-compose}\n\nSyllable composition\n:::\n\n```{svg-color-toggle-button} hangul-compose\n```\n\n\n\n### Stage 4: Fully decomposing the syllable (if composition is not possible) ###\n\nAn <samp>\"`Slv`,`T`\"</samp> syllable that does not compose into a Hangul Syllables\ncodepoint or that composes into a Hangul Syllables codepoint which is\nmissing in the active font must be fully decomposed into jamo.\n\nSimilarly, a precomposed `Slvt` or `Slv` syllable requires no shaping\nif the active font includes a glyph for the corresponding Hangul\nSyllables codepoint. If the corresponding codepoint is missing in the\nactive font, however, the syllable must be fully decomposed into jamo.\n\nUnicode defines a canonical algorithm for decomposing Hangul Syllables\ncodepoints into constituent jamo. The algorithm leverages the strict\njamo-ordering of the syllables in the block to provide an algebraic method to\ndetermine the codepoints of a syllable's `L`, `V`, and (if needed) `T`\njamo from the syllable's codepoint.\n\nThe algorithm defines the following consonants:\n\n```\n\tSBase = AC00\n\tLBase = 1100\n\tVBase = 1161\n\tTBase = 11A7\n\tLCount = 19\n\tVCount = 21\n\tTCount = 28\n\tNCount = (VCount * TCount) = 588\n\tSCount = (LCount * NCount) = 11172\n```\n\t\nFor a syllable codepoint S, the codepoints of the constituent `L`,\n`V`, and `T` jamo are found by computing:\n\n```\n\tSIndex = S - SBase\n\tLIndex = SIndex div NCount\n\tVIndex = (SIndex mod NCount) div TCount\n\tTIndex = SIndex mod TCount\n\tL = LBase + LIndex\n\tV = VBase + VIndex\n\tT = TBase + TIndex if TIndex > 0\n```\n\nIf `TIndex` = 0, then the syllable has no `T` jamo in the\ntrailing-consonant (jongseong) position.\n\nWith the syllable decomposed, the shaping engine can proceed to stage\nfive with the `L`, `V`, and (if used) `T` jamo. \n\n:::{figure-md}\n![Syllable decomposition](images/hangul/hangul-decompose.svg \"Syllable decomposition\"){.shaping-demo .inline-svg .greyscale-svg #hangul-decompose}\n\nSyllable decomposition\n:::\n\n```{svg-color-toggle-button} hangul-decompose\n```\n\n\n### Stage 5: Shaping the fully decomposed syllable with <abbr>GSUB</abbr> features ###\n\nWith the syllable fully decomposed into a sequence of jamo, the next\nstage applies mandatory substitution features using rules in the\nfont's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. \n\n\n#### Stage 5, step 1: `ccmp` ####\n\nThe `ccmp` feature allows a font to substitute basic-jamo sequences\nwith a pre-composed glyph including compound jamo. \n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n#### Stage 5, step 2: `ljmo` ####\n\nThis feature replaces the default (i.e., standalone) forms of leading\nconsonant (choseong) glyphs in a syllable cell with alternate forms\nthat fit into syllable-appropriate positions.\n\nThe appropriate shape of the choseong glyph depends on the shape of\nthe vowel (jungseong) that follows. For example, a tall jungseong forces\nthe usage of a tall choseong form.\n\nIn addition, if the syllable ends in a trailing consonant (jongseong),\nthen shorter forms of both the leading consonant (choseong) and vowel\n(jungseong) glyphs will be used in order to provide sufficient\nvertical space. \n\n:::{figure-md}\n![L Jamo feature application](images/hangul/hangul-ljmo.svg \"L Jamo feature application\"){.shaping-demo .inline-svg .greyscale-svg #hangul-ljmo}\n\nL Jamo feature application\n:::\n\n```{svg-color-toggle-button} hangul-ljmo\n```\n\n\n\n#### Stage 5, step 3: `vjmo` ####\n\nThis feature replaces the default (i.e., standalone) forms of vowel\n(jungseong) glyphs in a syllable cell with alternate forms that fit into\nsyllable-appropriate positions.\n\nThe appropriate shape of the jungseong glyph depends on the presence\nor absence of a trailing consonant (jongseong) at the end of the syllable.\n\nIf the syllable ends in a trailing consonant (jongseong), then shorter\nforms of both the leading consonant (choseong) and vowel (jungseong)\nglyphs will be used in order to provide sufficient vertical space.\n\n:::{figure-md}\n![V Jamo feature application](images/hangul/hangul-vjmo.svg \"V Jamo feature application\"){.shaping-demo .inline-svg .greyscale-svg #hangul-vjmo}\n\nV Jamo feature application\n:::\n\n```{svg-color-toggle-button} hangul-vjmo\n```\n\n\n\n#### Stage 5, step 4: `tjmo` ####\n\nThis feature replaces the default (i.e., standalone) forms of trailing\nconsonant (jongseong) glyphs in a syllable cell with alternate forms\nthat fit into syllable-appropriate positions.\n\nBecause jongseong jamo are always preceded by a choseong jamo and a\njungseong jamo, there is less variation in shape that the alternate\nforms can take on. A given font may, however, include several\ncontext-dependent alternates for stylistic or typographic variation.\n\n:::{figure-md}\n![T Jamo feature application](images/hangul/hangul-tjmo.svg \"T Jamo feature application\"){.shaping-demo .inline-svg .greyscale-svg #hangul-tjmo}\n\nT Jamo feature application\n:::\n\n```{svg-color-toggle-button} hangul-tjmo\n```\n\n\n\n### Stage 6. Reordering tone marks ###\n\nAny tone marks should now be reordered. In the text run, marks occur immediately after\nthe syllable to which they apply. After reordering, each mark should\nbe placed immediately to the left of the syllable.\n\nThis reordering move is the same regardless of whether the syllable in\nquestion is a precomposed syllable codepoint from the Hangul Syllables\nblock or a jamo-based syllable composed via the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nfeatures. Therefore, the reordering must take place at the end of the\nshaping process.\n\n:::{figure-md}\n![Tone-mark reordering](images/hangul/hangul-tone.svg \"Tone-mark reordering\"){.shaping-demo .inline-svg .greyscale-svg #hangul-tone}\n\nTone-mark reordering\n:::\n\n```{svg-color-toggle-button} hangul-tone\n```\n"
  },
  {
    "path": "opentype-shaping-hebrew.md",
    "content": "```{include} /_global.md\n```\n\n# Hebrew script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nHebrew script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The `<hebr>` shaping model](#the-hebr-shaping-model)\n      - [Stage 1: Compound character composition and decomposition](#stage-1-compound-character-composition-and-decomposition)\n      - [Stage 2: Composing any Alphabetic Presentation forms](#stage-2-composing-any-alphabetic-presentation-forms)\n      - [Stage 3: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 4: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-4-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 5: Applying the positioning features from <abbr>GPOS</abbr>](#stage-5-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nThe Hebrew script is used to write multiple languages, including\nHebrew, Yiddish, and Judezmo. \n\nHebrew is written (and, therefore, rendered) from right to\nleft. Shaping engines must track the directionality of the text run\nwhen scripts of different direction are mixed.\n\nThe Hebrew script tag defined in OpenType is `<hebr>`. Apart from the\nfact that Hebrew uses right-to-left directionality, the shaping\nprocess for `<hebr>` is identical to the default script-shaping\nmodel.\n\n> Note: The Letterlike Symbols block in Unicode includes four\n> codepoints corresponding to mathematical symbols based on Hebrew\n> letters. \n>\n> These codepoints are not expected to occur within valid Hebrew text\n> runs. In addition, because these codepoints are defined for usage in \n> mathematical expressions, they are designated as using left-to-right\n> directionality.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nHebrew script. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a Hebrew\ncharacter that is capable of taking a diacritical mark. \n \nMost of the base characters in Hebrew are consonants, although some\nbase characters are used to represent vowels in certain contexts.\n\nVowels that are not represented with base characters are frequently\nomitted from the text run entirely. Alternatively, such vowels may\nappear as marks called **niqqud**. Niqqud are also referred to as\n**points** in the Unicode standard.\n\nPronunciation marks, such as the dot used to distinguish \"Shin\" from\n\"Sin\" are also considered **niqqud**. Niqqud are typically positioned\nabove or below the base character.\n\n**Dagesh** is the term for a particular diacritic that alters the\npronunciation of a consonant. The dagesh is distinctive for being\npositioned inside the consonant glyph. Other Hebrew diacritics are\npositioned either above or below the base character.\n\nHebrew also includes a sizable set of **cantillation marks**, in\naddition to vowel, diacritical, and pronunciation marks. Cantillation\nmarks are also referred to as **tropes**.\n\n\n## Glyph classification ##\n\nBecause `<hebr>` text runs do not involve reordering or syllable\nidentification, Hebrew base characters do not require further\nclassification for script-shaping purposes.\n\nFive Hebrew letters have special word-final forms. Each of these is\nencoded separately in the Hebrew block. They are regarded as\ncontextual variants, not as distinct letters. The Hebrew block also\nincludes several digraphs that are used only when writing the Yiddish\nlanguages. \n\nBecause these word-final forms and digraphs are separately encoded,\nfonts do not implement <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions to provide access to them.\n\n\n### Mark classification ###\n\nBecause Hebrew text may include several types of mark (vowel niqqud,\ncantillation marks, pronunciation marks) positioned on a base\ncharacter, sequences of adjacent marks may need to be reordered.\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nHebrew marks all belong to standard combining classes. Most, but not\nall, cantillation marks are assigned to the generic below-base (220)\nor above-base (230) combining classes. Niqqud are assigned to distinct\ncombining classes designed to enforce orthographically correct\nordering:\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n| `U+0591`  | 220             | &#x0591; Etnahta                   |\n| `U+0592`  | 230             | &#x0592; Segol                     |\n| `U+05B0`  | 10              | &#x05B0; Sheva                     |\n| `U+05B2`  | 12              | &#x05B2; Hataf Patah               |\n| `U+05B9`  | 19              | &#x05B9; Holam                     |\n| `U+05BF`  | 23              | &#x05BF; Rafe                      |\n:::\n\n\nThe numeric values of these combining classes are used during Unicode\nnormalization.\n\n\n\t\t\t\n\t\t\t\n### Character tables ###\n\nThe Hebrew block in Unicode contains the codepoints required to\nrepresent text in all languages written using Hebrew.\n\nThe Alphabetic Presentation Forms block in Unicode includes 46\nadditional codepoints for Hebrew. Included are several precomposed\ncombinations of base characters and marks and the \"Alef Lamed\"\nligature, any of which may occur in `<hebr>` text runs. Glyphs for\nthese presentation forms may be provided by fonts that do not\nimplement the corresponding mark-to-base and ligature features in\nOpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> tables.\n\nThe Alphabetic Presentation Forms block also includes a set of eight\n\"wide\" variants of standard Hebrew characters (`U+FB21` through\n`U+FB28`) and a variant form of \"Ayin\" (`U+FB20`), for backwards\ncompatibility with retired file-encoding standards. New usage of these\ncodepoints is not recommended and they are unlikely to occur in\ncontemporary documents. \n\nConsequently, unless a software application is required to support\nspecific stores of documents that are known to have used these older\nencodings, the shaping engine should not be expected to handle any\ntext runs incorporating these backwards-compatibility variant\ncodepoints.\n\nSeparate character tables are provided for the Hebrew block, the\nHebrew letters included in the Alphabetic Presentation Forms block,\nand for other miscellaneous characters that are used in `<hebr>` text\nruns:\n\n  - [Hebrew character table](character-tables/character-tables-hebrew.md#hebrew-character-table)\n  - [Alphabetic Presentation Forms (Hebrew) character table](character-tables/character-tables-hebrew.md#alphabetic-presentation-forms-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-hebrew.md#miscellaneous-character-table)\n\n\nThe tables list each codepoint along with its Unicode general\ncategory. For marks, the table lists the codepoint's mark combining\nclass. The codepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Mark class | Glyph                        |\n|:----------|:-----------------|:-----------|:-----------------------------|\n|`U+05D0`   | Letter           | _0_        | &#x05D0; Alef                |\n| | | | | |\n|`U+05C1`   | Mark [Mn]        | 24         | &#x05C1; Point Shin Dot      |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n<!--- Character table example and explanation --->\n\nOther important characters that may be encountered when shaping runs\nof Hebrew text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nThe dotted-circle placeholder is frequently used when displaying a\nvowel or diacritical mark in isolation. Real-world text documents may\nalso use other characters, such as hyphens or dashes, in a similar\nplaceholder fashion; shaping engines should cope with this situation\ngracefully.\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>), zero-width joiner (<abbr>ZWJ</abbr>), and\nzero-width non-joiner (<abbr>ZWNJ</abbr>) may be used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n<!--- combining grapheme joiner explanation --->\n\n\n\n<!--- Zero-Width Non Joiner explanation --->\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes.\n\n\n<!--- How shaping is affected by the <abbr title=\"Left-To-Right\">LTR</abbr> and <abbr title=\"Right-To-Left\">RTL</abbr> markers explanation --->\n\n\nThe no-break space may be used to display those codepoints that\nare defined as non-spacing (such as niqqud or cantillation marks) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\n\n## The `<hebr>` shaping model ##\n\nProcessing a run of `<hebr>` text involves seven top-level stages:\n\n1. Compound character composition and decomposition\n2. Composing any Alphabetic Presentation forms\n3. Applying the language-form substitution features from <abbr>GSUB</abbr>\n4. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n5. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Compound character composition and decomposition ###\n\nIn this stage, the `ccmp` feature from <abbr title=\"Glyph Positioning table\">GPOS</abbr> is applied and the\nresulting sequence of codepoints should be checked for correct mark\norder. \n\n> Note: Shaping engines may have already applied Unicode normalization\n> compose or decompose codepoints before beginning the shaping\n> process. Due to the Alphabetic Presentation Forms composition in\n> stage two, however, the `ccmp` feature and any necessary mark\n> reordering must be performed here, as Alphabetic Presentation Forms\n> are not handled by Unicode normalization.\n\n\n#### Stage 1, step 1: ccmp\n\nThe `ccmp` feature allows a font to substitute\n\n - mark-and-base sequences with a pre-composed glyph including both\n   the mark and the base (as is done in with a ligature substitution)\n - individual compound glyphs with the equivalent sequence of\n   decomposed glyphs\n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n:::{figure-md}\n![ccmp composition](/images/hebrew/hebrew-ccmp.svg \"ccmp composition\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-ccmp}\n\nccmp composition\n:::\n\n```{svg-color-toggle-button} hebrew-ccmp\n```\n\n\n\n#### Stage 1, step 2: Mark reordering\n\nSequences of adjacent marks must be reordered so that they appear in\ncanonical order before the mark-to-base and mark-to-mark positioning\nfeatures from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nFor `<hebr>` text runs, normalizing the sequence of marks using the\nUnicode _canonical combining class_ of each mark should be sufficient.\n\n\n### Stage 2: Composing any Alphabetic Presentation forms ###\n\nIf the active font includes glyphs for precomposed mark-and-base\ncodepoints from the Alphabetic Presentation Forms block, these\nprecomposed glyphs should be preferred over sequences of individual\nbase glyphs and marks positioned with <abbr title=\"Glyph Positioning table\">GPOS</abbr>.\n\nThe codepoints in question are not included in the canonical Unicode\ncompositions, so the shaping engine should substitute them at this\nstage, before proceeding with the shaping process.\n\nThe individual base and mark sequences that should compose to each\nprecomposed Hebrew mark-and-base codepoint in the Alphabetic\nPresentation Forms block is listed in _Composition_ column of the\n[Alphabetic Presentation Forms character\ntable](character-tables/character-tables-hebrew.md#alphabetic-presentation-forms-character-table). \n\nFor example: \n\n:::{table} Example character table for Alphabetic Presentation forms\n\n| Codepoint | Unicode category | Mark class | Composition     | Glyph                                   |\n|:----------|:-----------------|:-----------|:----------------|:----------------------------------------|\n| `U+FB1D`  | Letter           | _0_        |`U+05D9`,`U+05B4`| &#xFB1D; Yod With Hiriq                 |\n| | | | | |\n| `U+FB2B`  | Letter           | _0_        |`U+05E9`,`U+05C2`| &#xFB2B; Shin With Sin Dot              |\n:::\n\n\nTwo of the precomposed glyphs, \"Shin With Dagesh And Shin Dot\"\n(`U+FB2C`) and \"Shin With Dagesh And Sin Dot\" (`U+FB2D`), have\nmultiple possible composing sequences. All of the other precomposed\nglyphs in the block have a single composing sequence.\n\n> Note: the active font may implement these compositions in a `ccmp`\n> lookup in <abbr title=\"Glyph Substitution table\">GSUB</abbr>, in which case this stage will involve no additional work.\n\n:::{figure-md}\n![Alphabetic Presentation forms composition](/images/hebrew/hebrew-apf.svg \"Alphabetic Presentation forms composition\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-apf}\n\nAlphabetic Presentation forms composition\n:::\n\n```{svg-color-toggle-button} hebrew-apf\n```\n\n\n\n### Stage 3: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Hebrew shaping model:\n\n\tlocl\n\t\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n\n### Stage 4: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Hebrew shaping model:\n\n\tliga\n\tdlig\n\t\n\n#### Stage 4, step 1: liga ####\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Standard ligature substitution](/images/hebrew/hebrew-liga.svg \"Standard ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-liga}\n\nStandard ligature substitution\n:::\n\n```{svg-color-toggle-button} hebrew-liga\n```\n\n\n\n#### Stage 4, step 2: dlig ####\n\nThe `dlig` feature substitutes additional optional ligatures that are\noff by default. Substitutions made by `dlig` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Discretionary ligature substitution](/images/hebrew/hebrew-dlig.svg \"Discretionary ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-dlig}\n\nDiscretionary ligature substitution\n:::\n\n```{svg-color-toggle-button} hebrew-dlig\n```\n\n\n\n### Stage 5: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nthe Hebrew shaping model:\n\n\tkern\n\tmark\n\n\n#### Stage 5, step 1: `kern` ####\n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\n\n:::{figure-md}\n![Kerning application](/images/hebrew/hebrew-kern.svg \"Kerning application\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-kern}\n\nKerning application\n:::\n\n```{svg-color-toggle-button} hebrew-kern\n```\n\n#### Stage 5, step 2: `mark` ####\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Mark positioning](/images/hebrew/hebrew-mark.svg \"Mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #hebrew-mark}\n\nMark positioning\n:::\n\n```{svg-color-toggle-button} hebrew-mark\n```\n"
  },
  {
    "path": "opentype-shaping-indic-general.md",
    "content": "# Indic script shaping in OpenType #\n\nThis document outlines the general shaping procedure shared by all\nIndic scripts, and defines the common pieces that script-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes](#shaping-classes)\n\t  - [Mark-placement subclasses](#mark-placement-subclasses)\n      - [Character tables](#character-tables)\n  - [The Indic2 shaping model](#the-indic2-shaping-model)\n      - [Sort ordering](#sort-ordering)\n      - [Script shaping characteristics](#script-shaping-characteristics)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The old Indic shaping model](#the-old-indic-shaping-model)\n\n\n## General information ##\n\nThe Indic family of scripts includes writing systems\nderived from the Brahmi script in ancient India. Although the scripts\nvary considerably in appearance, their shared ancestry means that they\nalso share a number of important features and rules. \n\nThis makes it possible (though, of course, not mandatory) for a\nshaping engine to implement a single shaping model that covers all of\nthe scripts. \n\nThe largest (by number of readers) scripts in the Indic family are:\n\n  - [Devanagari](opentype-shaping-devanagari.md)\n  - [Bengali](opentype-shaping-bengali.md)\n  - [Gujarati](opentype-shaping-gujarati.md)\n  - [Gurmukhi](opentype-shaping-gurmukhi.md)\n  - [Kannada](opentype-shaping-kannada.md)\n  - [Malayalam](opentype-shaping-malayalam.md)\n  - [Oriya](opentype-shaping-oriya.md)\n  - [Tamil](opentype-shaping-tamil.md)\n  - [Telugu](opentype-shaping-telugu.md)\n  - [Sinhala](opentype-shaping-sinhala.md)\n\nText runs in Indic scripts may also include characters from the Vedic\nExtensions block in Unicode. This is a set of marks and punctuation\nneeded to accurately transcribe ancient documents in Sanskrit.\n\nText runs in Indic scripts also make use of joiner, non-joiner, and\nplaceholder characters from other Unicode blocks, in order to specify\ncertain alternate shaping options.\n\nThere are two sets of Indic script tags defined in OpenType. Several\nfrom the older set (`<deva>`, `<beng>`, `<gujr>`, `<guru>`, `<knda>`,\n`<mlym>`, `<orya>`, `<taml>`, and `<telu>`) were deprecated and\nreplaced in 2005.\n\nThe new set of replacement tags for these scripts (`<dev2>`, `<bng2>`,\n`<gjr2>`, `<gur2>`, `<knd2>`, `<mlm2>`, `<ory2>`, `<tml2>`, and\n`<tel2>`) was devised to overcome shortcomings found in the original model. \nTherefore, new fonts should be engineered to work with the updated\nshaping model. However, if a font is encountered that supports only\nan older script tag, the shaping engine should deal with it gracefully.\n\nThe `<sinh>` tag, unlike the other Indic script tags,\nwas not deprecated in 2005 and is still used for Sinhala text.\n\n> Note: There are several other scripts derived from the Brahmi script\n> that are often treated separately and not bundled into the \"Indic\"\n> category by shaping engines. This is because these other scripts\n> evolved to have significantly distinct rules for syllable\n> construction, reordering, and shaping.\n>\n> The scripts include Buginese, Balinese, Javanese,\n> [Khmer](opentype-shaping-khmer.md),\n> [Lao](opentype-shaping-thai-lao.md),\n> [Myanmar](opentype-shaping-myanmar.md),\n> [Thai](opentype-shaping-thai-lao.md), and\n> [Tibetan](opentype-shaping-tibetan.md). \n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\nThe term \"matra\" is also used to refer to the headline above letters\nin scripts like Devanagari, Bengali, and Gurmukhi. To avoid ambiguity,\nthe term **headline** is used in most Unicode and OpenType shaping\ndocuments.\n\n**Halant** and **Virama** are both standard terms for the below-base\n\"vowel-killer\" sign. Unicode documents use the term \"virama\" most\nfrequently, while OpenType documents use the term \"halant\" most\nfrequently.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the\ndiacritical mark indicating that the preceding vowel should be\nnasalized. \n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n\n## Glyph classification ##\n\nShaping Indic text depends on the shaping engine correctly classifying\neach glyph in the run. The classifications must distinguish between\nconsonants, vowels (independent and dependent), numerals, punctuation,\nand various types of diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the\nUnicode standard is correct, but it is not sufficient to fully capture\nthe expected shaping behavior (such as glyph reordering). Therefore,\nIndic glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. \n\nLess common mark classes include `TONE_MARKER`, `CANTILLATION`,\n`GEMINATION_MARK`, `PURE_KILLER`,  and `SYLLABLE_MODIFIER`. An\nexplanation of each class is included in the shaping documentation of\neach script in which the class occurs.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (some of which are also categorized as `Mark [Mn]`).\n\nThere are several subclasses of consonants that arise on occasion, such as\n`CONSONANT_DEAD`, `CONSONANT_MEDIAL`, `CONSONANT_PLACEHOLDER`,\n`CONSONANT_WITH_STACKER`, and `CONSONANT_PRE_REPHA`. \n\nThese subclasses indicate that the letter should match simple\ntests for consonants (as in the regular expressions used during\nsyllable identification), but the subclass may factor into\nscript-specific rules encountered in later shaping stages.\n\nFor example, `CONSONANT_DEAD` indicates that, unlike standard\nconsonants, the dead consonant carries no inherent vowel. This lack of\nan inherent vowel means that the letter is likely not accompanied by a\n`VIRAMA`; failure to recognize this distinction could trick a naive\nparser into mis-identifying the letter as the base consonant of a\nsyllable during the base-consonant-identification step. \n\nNot every script features an instance of each consonant subclass. A\nfull explanation of each subclass's behavior is explained in the\nrelevant stage of each script's shaping documentation.\n\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType\nfeatures that might affect how the respective glyphs are drawn, such\nas `tnum`, which specifies the usage of tabular-width numerals, and\n`sups`, which replaces the default glyphs with superscript variants.\n\n### Mark-placement subclasses ###\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. \n\nThe actual attachment position of these glyphs is determined by the\nlookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr> table. However, the reordering rules for\nIndic scripts require that the shaping engine be able to identify\nmarks by their general position. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Character tables ###\n\nCharacter tables for all of the scripts, plus the Vedic Extensions and\nimportant miscellaneous characters, are available here:\n\n  - [Devanagari](character-tables/character-tables-devanagari.md) (Including Devanagari Extended)\n  - [Bengali](character-tables/character-tables-bengali.md)\n  - [Gujarati](character-tables/character-tables-gujarati.md)\n  - [Gurmukhi](character-tables/character-tables-gurmukhi.md)\n  - [Kannada](character-tables/character-tables-kannada.md)\n  - [Malayalam](character-tables/character-tables-malayalam.md)\n  - [Oriya](character-tables/character-tables-oriya.md)\n  - [Tamil](character-tables/character-tables-tamil.md)\n  - [Telugu](character-tables/character-tables-telugu.md)\n  - [Sinhala](character-tables/character-tables-sinhala.md)\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Indic-script text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. \n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>.\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp> in those scripts that use an implicit sequence to request a\n<samp>\"Reph\"</samp> form.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\n> Note: this particular usage of <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> may not apply to scripts that\n> feature an explicit <samp>\"Reph\"</samp> codepoint or an explicit sequence for\n> requesting <samp>\"Reph\"</samp>. See the script-specific shaping documents for\n> full details.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of text in several of the\nsupported scripts often use the danda (`U+0964`) and double danda\n(`U+0965`) punctuation marks from the Devanagari block.\n\n\t\n## The Indic2 shaping model ##\n\nProcessing a run of text in any of the modern Indic script tags\ninvolves six top-level stages: \n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nThe initial reordering and final reordering stages each involve a set\nof script-specific rules that dictate how characters are reordered\nfrom their sequence in the input stream into the correct ordering for\nshaping rules to apply.\n\nSpecifically, certain consonants in each script are repositioned from\ntheir logical position (that is, their position in the input\nstream). The most common example is <samp>\"Ra\"</samp>, which is frequently\nconverted into a combining mark-like form. \n\nThe resulting mark must be correctly positioned by attaching it to the\ncorrect base character using the active font's `mark` lookup from\n<abbr title=\"Glyph Positioning table\">GPOS</abbr>. Therefore, the mark form of the <samp>\"Ra\"</samp> must be moved so that it\nis adjacent to the correct base character. Which character in a\nsyllable is the correct base character differs from script to script,\nand may involve several context-sensitive tests.\n\nSimilarly, certain other consonants in each script also take on\ndistinct forms that require reordering so that `mark` positioning\nand other lookups function correctly. Dependent vowels (matras) may\nalso need to be reordered so that they are adjacent to the correct\nconsonant. These functions, too, involve script-specific rule sets.\n\nBecause of the script-specific rules involved, it is mandatory that\nthe basic substitution features in stage three be applied in the order\nspecified. \n\nThe remaining substitution features in stage five and the positioning\nfeatures in stage six, however, do not have a mandatory order.\n\n### Sort ordering ###\n\nA single, canonical sequence of ordering positions exists that\ncaptures all of the possible positions in an Indic syllable. \n\nNot every position is used in every script and not every syllable will\ncontain a character in every position. Whenever characters in a\nsyllable are reordered during the shaping process, \n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\nNot every position is used in every script; the sequence merely\ndescribes all of the possible positions at which a character in an\nIndic syllable can exist. Using the same sequence for all scripts\ncould reduce an implementation's code size and complexity.\n\nThe basic positions (left to right) are <samp>\"Reph\"</samp> (`POS_RA_TO_BECOME_REPH`), dependent\nvowels (matras) and consonants positioned before the base\nconsonant or syllable base (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base\nconsonant or syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant or syllable base\n(`POS_POSTBASE_CONSONANT`), syllable-final consonants (`POS_FINAL_CONSONANT`),\nand syllable-modifying or Vedic signs (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the base consonant or syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\npositions matter only in that they are unambiguous. \n\nThe description of the general shaping process that follows will note\nwhen a character needs to be marked for reordering into some of these\npositions. The specifics for each script provide additional details,\nespecially for ordering positions that are only used in that script.\n\n\n### Script shaping characteristics ###\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The rules that determine the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - How <samp>\"Reph\"</samp> is encoded or requested in a syllable.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    after the base consonant or syllable base, or to consonants before the base\n    consonant and those after the base consonant or syllable base.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, the ordering for left-side, right-side, \n    above-base, and below-base matras follow different rules. The\n    rules employed vary between scripts, except for left-side matras,\n    where all Indic scripts follow the same rule. \n\nIn the lists that follow, the options for each characteristic are\nmutually exclusive, and they are exhaustive for the set of Indic\nscripts [listed](#general-information) at the beginning of this\ndocument (Devanagari, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam,\nOriya, Tamil, Telugu, and Sinhala).\n\nImplementers who wish to cover additional scripts using the same\nmethod would first need to determine whether any additional options\nare relevant for each characteristic.\n\n#### Base consonant ####\n\nLocating the base consonant of a syllable generally requires parsing\nthe syllable to catch and exclude certain special-treatment consonants\n(such as <samp>\"Ra\"</samp>s that will form <samp>\"Reph\"</samp>s or consonants that take on\nbelow-base forms). However, each script has a general base-consonant\nposition that determines the appropriate search method. The base\nconsonant may be, generally:\n\n  - The first consonant. This is designated `BASE_POS_FIRST`. This is\n    the simplest base-consonant rule. After eliminating any initial\n    <samp>\"Repha\"</samp>s from consideration, the first consonant is always the\n    base consonant, without exception.\n  \n  - The last consonant, not counting any special forms. This is\n    designated `BASE_POS_LAST`. This is the most complicated\n    base-consonant rule, because the type and variety of special forms\n    vary considerably between scripts. \n\t\n\tThe `BASE_POS_LAST` search algorithm (described in each script's\n    shaping document) accounts for these special forms in every\n    script. The abundance of special forms in certain scripts may\n    routinely cause the search algorithm to identify a base consonant\n    that is not logically last in the syllable. This is expected\n    behavior.\n\t\n\tThis base-consonant position is used in Devanagari, Bengali,\n\tGujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu.\n  \n  - The last consonant that is not preceded by a <samp>\"ZWJ\"</samp> (zero width\n    joiner) character. \n\t\n\tThis position is only used in Sinhala, and is designated\n    `BASE_POS_LAST_SINHALA`.\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding base-consonant rules are summarized in the following\ntable:\n\n:::{table} Base-consonant rules by script\n\n| Script     | Base-consonant rule    |\n|:-----------|:-----------------------|\n| Devanagari | `BASE_POS_LAST`        |\n| Bengali    | `BASE_POS_LAST`        |\n| Gujarati   | `BASE_POS_LAST`        |\n| Gurmukhi   | `BASE_POS_LAST`        |\n| Kannada    | `BASE_POS_LAST`        |\n| Malayalam  | `BASE_POS_LAST`        |\n| Oriya      | `BASE_POS_LAST`        |\n| Tamil      | `BASE_POS_LAST`        |\n| Telugu     | `BASE_POS_LAST`        |\n| Sinhala    | `BASE_POS_LAST_SINHALA`|\n:::\n\n\n> Note: None of the specific scripts currently included in the \"Indic\"\n> script group as it is enumerated in this document make use of the\n> `BASE_POS_FIRST` base-consonant rule. However, the `BASE_POS_FIRST`\n> rule is employed by several Brahmi-derived scripts also used in the\n> region, including both [Myanmar](opentype-shaping-myanmar.md) and\n> [Khmer](opentype-shaping-khmer.md). \n>\n> Because these scripts share many other characteristics and\n> conventions with the Indic group described by this document,\n> `BASE_POS_FIRST` is included here for comparison. \n\n> Note: The `BASE_POS_LAST` search algorithm is used for Kannada and\n> Telugu, although the unique properties of the Kannada and Telugu\n> orthographies usually result in the search terminating at the first\n> non-<samp>\"Reph\"</samp> consonant in a syllable. Namely, all consonants in\n> Kannada and Telugu have a post-base form. \n>\n> This is the expected behavior for Kannada and Telugu, and still\n> differs from the `BASE_POS_FIRST` rule as used in the Brahmi-derived \n> scripts mentioned above. See those individual script pages for\n> further detail.\n\n\n#### Reph position ####\n\n<samp>\"Reph\"</samp> may be positioned:\n\n  - at the beginning of the syllable, in the ordering position\n    `POS_RA_TO_BECOME_REPH`.\n\t\n  - immediately before the first subjoined (below-base) consonant, in\n    the ordering position `POS_BEFORE_SUBJOINED`.\n\t\n  - immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.\n\t\n  - immediately after the last subjoined (below-base) consonant, in\n    the ordering position `POS_AFTER_SUBJOINED`.\n\n  - immediately before the last post-base consonant, in the ordering\n    position `POS_BEFORE_POST`.\n\t\n  - immediately after the last post-base consonant, in the ordering\n    position `POS_AFTER_POST`.\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding Reph-position rules are summarized in the following\ntable:\n\n:::{table} Reph-position rules by script\n\n| Script     | Reph-position rule         |\n|:-----------|:---------------------------|\n| Devanagari | `REPH_POS_BEFORE_POST`     |\n| Bengali    | `REPH_POS_AFTER_SUBJOINED` |\n| Gujarati   | `REPH_POS_BEFORE_POST`     |\n| Gurmukhi   | `REPH_POS_BEFORE_SUBJOINED`|\n| Kannada    | `REPH_POS_AFTER_POST`      |\n| Malayalam  | `REPH_POS_AFTER_MAIN`      |\n| Oriya      | `REPH_POS_AFTER_MAIN`      |\n| Tamil      | `REPH_POS_AFTER_POST`      |\n| Telugu     | `REPH_POS_AFTER_POST`      |\n| Sinhala    | `REPH_POS_AFTER_POST`      |\n:::\n\n\n\n#### Reph encoding ####\n\n<samp>\"Reph\"</samp> may be:\n\n  - requested explicitly, using the sequence <samp>\"Ra,Halant,ZWJ\"</samp>. This is\n    designated `REPH_MODE_EXPLICIT`.\n  \n  - Formed implicitly by the sequence <samp>\"Ra,Halant\"</samp> when used in certain positions\n    in a syllable. This is designated `REPH_MODE_IMPLICIT`. Because a\n    <samp>\"Ra,Halant\"</samp> does _not_ form a <samp>\"Reph\"</samp> in _every_ position in a\n    syllable, script-specific tests are required.\n\n  - encoded as a separate codepoint. This codepoint is generally\n    called <samp>\"Repha\"</samp>, which distinguishes it from the <samp>\"Reph\"</samp>s formed by\n    other sequences. A <samp>\"Repha\"</samp> may need reordering based on script\n    specific rules, in which case `REPH_MODE_LOGICAL_REPHA` is\n    used. Alternatively, the script may not reorder <samp>\"Repha\"</samp>s at all,\n    in which case `REPH_MODE_VISUAL_REPHA` is used.\n\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding Reph-encoding rules are summarized in the following\ntable:\n\n:::{table} Reph-encoding rules by script\n\n| Script     | Reph-encoding rule         |\n|:-----------|:---------------------------|\n| Devanagari | `REPH_MODE_IMPLICIT`       |\n| Bengali    | `REPH_MODE_IMPLICIT`       |\n| Gujarati   | `REPH_MODE_IMPLICIT`       |\n| Gurmukhi   | `REPH_MODE_IMPLICIT`       |\n| Kannada    | `REPH_MODE_IMPLICIT`       |\n| Malayalam  | `REPH_MODE_LOGICAL_REPHA`  |\n| Oriya      | `REPH_MODE_IMPLICIT`       |\n| Tamil      | `REPH_MODE_IMPLICIT`       |\n| Telugu     | `REPH_MODE_EXPLICIT`       |\n| Sinhala    | `REPH_MODE_EXPLICIT`       |\n:::\n\n\n> Note: None of the specific scripts currently included in the \"Indic\"\n> group as it is enumerated in this document make use of the\n> `REPH_MODE_VISUAL_REPHA` encoding. However, `REPH_MODE_VISUAL_REPHA`\n> is used in the [Khmer](opentype-shaping-khmer.md) script. \n>\n> Because Khmer shares many other characteristics and\n> conventions with the Indic group described by this document,\n> `REPH_MODE_VISUAL_REPHA` is included here for comparison. \n\n\n\n#### Below-base forms ####\n\nBelow-base consonant forms (the `blwf` feature) may be applied:\n\n  - Only to consonants after the base consonant or syllable base. This is designated\n    `BLWF_MODE_POST_ONLY`.\n\t\n  - To consonants occurring before or after the base consonant or syllable base. This is\n    designated `BLWF_MODE_PRE_AND_POST`.\n\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding below-base–forms rules are summarized in the following\ntable:\n\n:::{table} Below-base–forms rules by script\n\n| Script     | Below-base–forms rule    |\n|:-----------|:-------------------------|\n| Devanagari | `BLWF_MODE_PRE_AND_POST` |\n| Bengali    | `BLWF_MODE_PRE_AND_POST` |\n| Gujarati   | `BLWF_MODE_PRE_AND_POST` |\n| Gurmukhi   | `BLWF_MODE_PRE_AND_POST` |\n| Kannada    | `BLWF_MODE_POST_ONLY`    |\n| Malayalam  | `BLWF_MODE_PRE_AND_POST` |\n| Oriya      | `BLWF_MODE_PRE_AND_POST` |\n| Tamil      | `BLWF_MODE_PRE_AND_POST` |\n| Telugu     | `BLWF_MODE_POST_ONLY`    |\n| Sinhala    | `BLWF_MODE_PRE_AND_POST` |\n:::\n\n\n#### Left-side matras ####\n\nAll Indic scripts position left-side matras in the same\nmanner, in the ordering position `POS_PREBASE_MATRA`.\n\n#### Right-side matras ####\n\nRight-side matras may be positioned:\n\n  - immediately before the first subjoined (below-base) consonant, in\n    the ordering position `POS_BEFORE_SUBJOINED`.\n\t\n  - immediately after the last subjoined (below-base) consonant, in\n    the ordering position `POS_AFTER_SUBJOINED`.\n\n  - immediately after the last post-base consonant, in the ordering\n    position `POS_AFTER_POST`.\n\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding right-side–matra positions are summarized in the following\ntable:\n\n:::{table} Right-side–matra positions by script\n\n| Script     | Right-side–matra position |\n|:-----------|:--------------------------|\n| Devanagari | `POS_AFTER_SUBJOINED`     |\n| Bengali    | `POS_AFTER_POST`          |\n| Gujarati   | `POS_AFTER_POST`          |\n| Gurmukhi   | `POS_AFTER_POST`          |\n| Kannada    | _varies_                  |\n| Malayalam  | `POS_AFTER_POST`          |\n| Oriya      | `POS_AFTER_POST`          |\n| Tamil      | `POS_AFTER_POST`          |\n| Telugu     | _varies_                  |\n| Sinhala    | `POS_AFTER_SUBJOINED`     |\n:::\n\n\n\n> Note: In most scripts, all right-side matras are positioned in the\n> same sort-order position. The Kannada and Telugu scripts, however,\n> feature more complex positioning rules for right-side matras, in\n> which different right-side matras must be sorted into different\n> positions. See the script-specific shaping documents for full\n> details.\n\n\n#### Above-base matras ####\n\nAbove-base matras may be positioned:\n\n  - immediately before the first subjoined (below-base) consonant, in\n    the ordering position `POS_BEFORE_SUBJOINED`.\n\t\n  - immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.\n\t\n  - immediately after the last subjoined (below-base) consonant, in\n    the ordering position `POS_AFTER_SUBJOINED`.\n\n  - immediately after the last post-base consonant, in the ordering\n    position `POS_AFTER_POST`.\n\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding above-base–matra positions are summarized in the following\ntable:\n\n:::{table} Above-base–matra positions by script\n\n| Script     | Above-base–matra position |\n|:-----------|:--------------------------|\n| Devanagari | `POS_AFTER_SUBJOINED`     |\n| Bengali    | _null_                    |\n| Gujarati   | `POS_AFTER_SUBJOINED`     |\n| Gurmukhi   | `POS_AFTER_POST`          |\n| Kannada    | `POS_BEFORE_SUBJOINED`    |\n| Malayalam  | _null_                    |\n| Oriya      | `POS_AFTER_MAIN`          |\n| Tamil      | `POS_AFTER_SUBJOINED`     |\n| Telugu     | `POS_BEFORE_SUBJOINED`    |\n| Sinhala    | `POS_AFTER_SUBJOINED`     |\n:::\n\n\n\n#### Below-base matras ####\n\nBelow-base matras may be positioned:\n\n  - immediately before the first subjoined (below-base) consonant, in\n    the ordering position `POS_BEFORE_SUBJOINED`.\n\t\n  - immediately after the last subjoined (below-base) consonant, in\n    the ordering position `POS_AFTER_SUBJOINED`.\n\n  - immediately after the last post-base consonant, in the ordering\n    position `POS_AFTER_POST`.\n\n\nThe scripts currently described in the \"Indic\" script group  and their\ncorresponding below-base–matra positions are summarized in the following\ntable:\n\n:::{table} Below-base–matra positions by script\n\n| Script     | Below-base–matra position |\n|:-----------|:--------------------------|\n| Devanagari | `POS_AFTER_SUBJOINED`     |\n| Bengali    | `POS_AFTER_SUBJOINED`     |\n| Gujarati   | `POS_AFTER_POST`          |\n| Gurmukhi   | `POS_AFTER_POST`          |\n| Kannada    | `POS_BEFORE_SUBJOINED`    |\n| Malayalam  | `POS_AFTER_POST`          |\n| Oriya      | `POS_AFTER_SUBJOINED`     |\n| Tamil      | `POS_AFTER_POST`          |\n| Telugu     | `POS_BEFORE_SUBJOINED`    |\n| Sinhala    | `POS_AFTER_SUBJOINED`     |\n:::\n\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in an Indic script consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\nThe Nukta, Halant/Virama, and Anudatta marks can affect syllable\nidentification. All other signs are regarded as syllable modifier\nsigns, including those from the Vedic Extensions block.\n\nGenerally speaking, each syllable contains exactly one vowel\nsound. Valid syllables may begin with either a consonant or an\nindependent vowel.\n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nSome Indic scripts also include special consonants that can occur after the\nbase consonant or syllable base. These post-base consonants and final consonants will\nalso be separated from the base consonant or syllable base by a <samp>\"Halant\"</samp> mark; the\nalgorithm for correctly identifying the base consonant includes a test\nto recognize these sequences and not mis-identify the base consonant.\n\nIn Indic scripts, the consonant <samp>\"Ra\"</samp> receives special treatment; in\nmany circumstances it is replaced by one of two combining mark-like forms. \n\n  - A <samp>\"Ra,Halant\"</samp> or <samp>\"Ra,Halant,ZWJ\"</samp> sequence at the beginning of a\n    syllable may be replaced with an above-base mark called <samp>\"Reph\"</samp>\n    (although script-specifics rules may negate this replacement if\n    the <samp>\"Ra\"</samp> is the only consonant in the syllable). \n\n  - <samp>\"Halant,Ra\"</samp> sequences that occur elsewhere in the syllable may\n    take on a below-base form (called <samp>\"Rakaar\"</samp> in Devanagari and most\n    other scripts, and called <samp>\"Raphala\"</samp> in Bengali).\n\nIn addition, some scripts reorder post-base <samp>\"Ra\"</samp>s to a pre-base\nposition. These re-ordering <samp>\"Ra\"</samp>s may take on a different form, but\nthey are letter-like rather than mark-like forms.\n\n<samp>\"Reph\"</samp>, <samp>\"Rakaar\"</samp>, <samp>\"Raphala\"</samp>, and reordering <samp>\"Ra\"</samp> characters must be\nreordered after the syllable-identification stage is complete. \n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in Indic scripts, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their shaping class, using regular expressions. \n\nThe following general-purpose regular expressions can be\nused to match Indic syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER`| `NUMBER` \n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\n> Note: \"standalone\" syllables can be used to display examples of\n> letters, marks, and other characters without requiring full\n> syllables or words.\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\nThis may mean moving dependent-vowel (matra) glyphs, <samp>\"Ra,Halant\"</samp>\nsequences, and other consonants that take special \ntreatment in some circumstances.\n\nThese reordering moves are mandatory. The final-reordering stage\nmay make additional moves, depending on the text and on the features\nimplemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. Not every position will be utilized in every script.\n\nAdditional information about the ordering positions is available in\nthe [sort ordering](#sort-ordering) section of this document.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nThe algorithm used to find the base consonant varies according to the\nbase-consonant shaping characteristic of the script.\n\nFor `BASE_POS_FIRST` scripts, the first consonant of the syllable is\nthe base consonant.\n\n> Note: None of the specific scripts currently included in the \"Indic\"\n> group as it is enumerated in this document make use of the\n> `BASE_POS_FIRST` base-consonant rule. However, the `BASE_POS_FIRST`\n> rule is employed by several Brahmi-derived scripts also used in the\n> region, including both [Myanmar](opentype-shaping-myanmar.md) and\n> [Khmer](opentype-shaping-khmer.md). \n>\n> Because these scripts share many other characteristics and\n> conventions with the Indic group described by this document,\n> `BASE_POS_FIRST` is included here for comparison. \n\n\nFor `BASE_POS_LAST` scripts, the base consonant is the last consonant\nin the syllable, excluding all consonants that will take on special\npost-base, final, or below-base forms, and excluding all pre-base\nreordering <samp>\"Ra\"</samp>s. For a detailed explanation of the search algorithm\nemployed, see the page for each specific script.\n\nFor Sinhala, which uses `BASE_POS_LAST_SINHALA`, the base consonant is\nthe last consonant that is not preceded by a zero-width joiner\n(<samp>\"ZWJ\"</samp>).\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reodering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any two-part or three-part dependent vowels (matras) must be decomposed\ninto their component parts.\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n\n#### Stage 2, step 3: Tag decomposed matras ####\n\nThird, all dependent-vowel (matra) signs must be tagged with\ntheir final position. \n\nSingle-part matras can be tagged with the appropriate sort-ordering\nposition based on the ordering position of the script's specific\nscript-shaping characteristics. \n\nIn most cases, all matras of the same Mark-positioning subclass (such\nas `LEFT_POSITION`) in a particular script are tagged with the same\nfinal position (such as `POS_PREBASE_MATRA`). \n\nSome scripts, however, include matras that must be tagged according to\nmore involved rule sets. In the set of Indic scripts described here,\nthis includes [Kannada](opentype-shaping-kannada.md) and\n[Telugu](opentype-shaping-telugu.md). See the individual\nscript-shaping document of each script to find a complete description\nof the applicable matra-tagging rules.\n\n> Note: The shaping engine may, as an alternative, choose to perform\n> this tagging earlier, such as during an initial Unicode-normalization\n> stage. \n>\n> Matras that resulted from the preceding decomposition step, however,\n> may not have been tagged when they were decomposed. If not, they must\n> be tagged for reordering before proceeding to the next step.\n\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> (in `REPH_MODE_IMPLICIT` scripts) or\n<samp>\"Ra,Halant,ZWJ\"</samp> (in `REPH_MODE_EXPLICIT` scripts) sequences that will\nbecome <samp>\"Reph\"</samp>s must be tagged with `POS_RA_TO_BECOME_REPH`.\n\n#### Stage 2, step 7: Post-base consonants ####\n\nSeventh, any non-base consonants that occur after a dependent vowel\n(matra) sign must be tagged with `POS_POSTBASE_CONSONANT`. Such\nconsonants will either be followed by a <samp>\"Halant\"</samp> glyph or will be in\nthe `CONSONANT_DEAD` shaping class. \n\t\n  <!--- Double check: should this be <samp>\"_Consonant_,Halant\"</samp> instead of\n        <samp>\"Halant,_Consonant_\"</samp>? --->\n\t\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,Ra\"</samp>, <samp>\"Halant,Ha\"</samp>, and <samp>\"Halant,Va\"</samp> in\n            post-base positions and <samp>\"Ra,Halant\"</samp>, <samp>\"Ha,Halant\"</samp>, and\n            <samp>\"Va,Halant\"</samp> in non-initial pre-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match initial <samp>\"Halant,Ya\"</samp> in post-base position\n  - `vatu` should match <samp>\"_Consonant_,Halant,Ra\"</samp>,\n           <samp>\"_Consonant_,Halant,Ha\"</samp>, and <samp>\"_Consonant_,Halant,Va\"</samp>\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf \n\tpref \n\tblwf \n\tabvf \n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar\n\n\nNot every feature is used in every script. See the individual script\npages for further script-specific information.\n\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"_Consonant_,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unneccessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. The correct final position depends on the script's\nReph-position shaping characteristic, and is conditional upon the\npresence or absence of certain characters (such as post-base\nconsonants or <samp>\"matra,Halant\"</samp> sequences) in the syllable. \n\nThe full algorithm for determining the final Reph position has seven steps.\n\n(a) If the script uses Reph-position rule `REPH_POS_AFTER_POST`, jump\nimmediately to step (e). Otherwise, proceed to step (b).\n\n(b) Find the first explicit <samp>\"Halant\"</samp> between the syllable base\nconsonant and the first post-Reph consonant. If there is a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>\nfollowing this <samp>\"Halant\"</samp>, move the <samp>\"Reph\"</samp> to a position immediately\nafter the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>, then proceed to step (mH). Otherwise, move the\n<samp>\"Reph\"</samp> to a position immediately after the <samp>\"Halant\"</samp>, then proceed to\nstep (mH). If no such explicit <samp>\"Halant\"</samp> is found, proceed to step\n(c).\n\n(c) If the script uses Reph-position rule `REPH_POS_AFTER_MAIN`, find\nthe first consonant not ligated with the syllable base, and that is\nnot a potential pre-base reordering <samp>\"Ra\"</samp>. If such a consonant is\nfound, move the <samp>\"Reph\"</samp> to a position immediately after the\nconsonant, then proceed to step (mH). If no such consonant is found,\nproceed to step (d). If the script uses a different Reph-position\nrule, proceed to step (d).\n\n(d) If the script uses Reph-position rule `REPH_POS_BEFORE_POST`, find\nthe first post-base consonant not ligated with the syllable base. If\nsuch a consonant is found, move the <samp>\"Reph\"</samp> to a position immediately\nbefore the consonant, then proceed to step (mH). If no such consonant\nis found, proceed to step (e). If the script uses a different\nReph-position rule, proceed to step (e).\n\n(e) Move the <samp>\"Reph\"</samp> to a position immediately before the first\npost-base matra, syllable modifier sign or Vedic sign that has a\nreordering class after the intended Reph position in the syllable sort\norder (as listed in [stage 2](#stage-2-initial-reordering)). This will be\nthe final <samp>\"Reph\"</samp> position. , then proceed to step (mH). If no such\nmatra or sign is found, proceed to step (f).\n\n(f) Move the <samp>\"Reph\"</samp> to the end of the syllable. \n\n(mH) Finally, if the <samp>\"Reph\"</samp> position arrived at in the preceding steps\nis immediately after a <samp>\"matra,Halant\"</samp> sequence, move the <samp>\"Reph\"</samp> so\nthat it is before the <samp>\"Halant\"</samp>. \n\n\nTaking the Reph-position–rule conditionals in the above algorithm into\naccount, the position-finding steps that may be executed in each\nscript are summarized in the following table:\n\n:::{table} Summary of final–reph-positioning rules by script\n\n| Script     | Reph-position rule        | a | b | c | d | e | f | mH |\n|:-----------|:--------------------------|:--|:--|:--|:--|:--|:--|:---|\n| Devanagari |`REPH_POS_BEFORE_POST`     |   | • |   | • | • | • | •  |\n| Bengali    |`REPH_POS_AFTER_SUBJOINED` |   | • |   |   |   | • | •  |\n| Gujarati   |`REPH_POS_BEFORE_POST`     |   | • |   | • | • | • | •  |\n| Gurmukhi   |`REPH_POS_BEFORE_SUBJOINED`|   | • |   |   |   | • | •  |\n| Kannada    |`REPH_POS_AFTER_POST`      |   |   |   |   | • | • | •  |\n| Malayalam  |`REPH_POS_AFTER_MAIN`      |   | • | • |   | • | • | •  |\n| Oriya      |`REPH_POS_AFTER_MAIN`      |   | • | • |   | • | • | •  |\n| Tamil      |`REPH_POS_AFTER_POST`      |   |   |   |   | • | • | •  |\n| Telugu     |`REPH_POS_AFTER_POST`      |   |   |   |   | • | • | •  |\n| Sinhala    |`REPH_POS_AFTER_MAIN`      |   | • | • |   | • | • | •  |\n:::\n\n\n#### Stage 4, step 4: Pre-base reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n> Note: The `init` feature for word-initial dependent vowels (matras)\n> is defined only for Bengali and should not be expected in fonts for\n> any other scripts. Therefore, this step will involve no work when\n> processing non-`<bng2>` text. \n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n## The old Indic shaping model ##\n\n\nThe older Indic script tags (`<deva>`, `<beng>`, `<gujr>`, `<guru>`, `<knda>`,\n`<mlym>`, `<orya>`, `<taml>`, and `<telu>`) have been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\nthese tags and some users may still have documents that were written to\ntake advantage of the original shaping rules.\n\n### Distinctions from the Indic2 model ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures) was altered when migrating from the old to the new shaping model. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe old-model Indic syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn Indic2 text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, all scripts\nwere treated as if they followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<deva>`, `<beng>`, `<gujr>`, `<guru>`, `<knda>`,\n`<orya>`, and `<telu>` text, the left-side matra was always positioned\nat the beginning of the syllable. For `<mlym>` and `<taml>` text, the\nleft-side matra was positioned immediately before the base consonant or syllable base.\n\n\n### Advice for handling fonts with old Indic features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the old shaping model.\n\n### Advice for handling text runs composed in the old Indic format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\none of the old Indic script tags and it is known that the font in use supports\nonly the Indic2 shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occuring before the base consonant when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the old-model Indic ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-kannada.md",
    "content": "```{include} /_global.md\n```\n\n# Kannada shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Kannada script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Kannada character tables](#kannada-character-tables)\n  - [The `<knd2>` shaping model](#the-knd2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<knda>` shaping model](#the-knda-shaping-model)\n      - [Distinctions from `<knd2>`](#distinctions-from-knd2)\n      - [Advice for handling fonts with `<knda>` features only](#advice-for-handling-fonts-with-knda-features-only)\n      - [Advice for handling text runs composed in `<knda>` format](#advice-for-handling-text-runs-composed-in-knda-format)\n\n\n## General information ##\n\nThe Kannada script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the South Indic subgroup, in which\nsequences of adjacent consonants are often represented as below-base forms.\n\nThe Kannada script is used to write multiple languages, most commonly\nKannada, plus several minority languages. In addition, Sanskrit may be written\nin Kannada, so Kannada script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Kannada script tags defined in OpenType, `<knda>`\nand `<knd2>`. The older script tag, `<knda>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<knd2>`\nshaping model. However, if a font is encountered that supports only\n`<knda>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. In the Kannada\nlanguage, dependent-vowel signs may also be referred to as _swara_ forms.\n\nThe term \"matra\" is also used to refer to the headline in other Indic\nscripts, and may be used to describe the distinctive cap stroke above most\nKannada letters by comparison. To avoid ambiguity, the term **headline** is\nused in most Unicode and OpenType shaping documents.\n\n**Halant** and **Virama** are both standard terms for the above-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Kannada\nlanguage, this sign is known as the _hrasva_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Kannada\nlanguage, this mark is known as the _candrabindu_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Kannada text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nKannada glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (some of which are also categorized as `Mark [Mn]`).\n\nKannada uses one subclass of consonant, `CONSONANT_WITH_STACKER`. This\nsubclass supports two consonants, <samp>\"Jihvamuliya\"</samp> (`U+0CF1`) and\n<samp>\"Upadhmaniya\"</samp> (`U+0CF2`), that are used only for Sanskrit text\nruns. These consonants may form stacked ligatures with subsequent\nconsonants without an intervening <samp>\"Halant\"</samp>. Such ligature formation,\nif desired, must be implemented in the font.\n\nThe letters classified as `CONSONANT_WITH_STACKER` should be treated\nas consonants when [identifying\nsyllables](#stage-1-identifying-syllables-and-other-sequences). No\nadditional behavior is required.\n\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Kannada character tables ###\n\nSeparate character tables are provided for the Kannada and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<knd2>` text runs:\n\n  - [Kannada character table](character-tables/character-tables-kannada.md#kannada-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-kannada.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-kannada.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0C81`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0C81; Candrabindu         |\n| | | | |\n|`U+0C95`   | Letter           | CONSONANT         | _null_                     | &#x0C95; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Kannada text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. \n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead. \n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>.\n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. \n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of Kannada text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<knd2>` shaping model ##\n\nProcessing a run of `<knd2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Kannada's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any consonants with post-base forms.\n\t \n\t - Kannada differs somewhat from other `BASE_POS_LAST` scripts in\n       that all consonants can use post-base forms. Therefore, the\n       general base-consonant search algorithm should identify the first\n       non-<samp>\"Reph\"</samp> consonant as the base. This is the expected\n       behavior, as it allows the same search algorithm to be used\n       with all `BASE_POS_LAST` scripts.\n\n  - `REPH_POS_AFTER_POST` = <samp>\"Reph\"</samp> is ordered after the last post-base\n     consonant form.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_POST_ONLY` = The below-forms feature is applied only to\n     post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_BEFORE_SUBJOINED`  = Above-base matras are\n    ordered before any subjoined (i.e., below-base) consonant forms.\n\n  - `MATRA_POS_RIGHT` = Kannada includes right-side matras that follow two\n     different reordering rules. \n\t \n\t - Matras <samp>\"Sign Vocalic R\"</samp> (`U+0CC3`), <samp>\"Sign Vocalic Rr\"</samp>\n       (`U+0CC4`), <samp>\"Sign Ee\"</samp> (`U+0CC7`), <samp>\"Sign Ai\"</samp> (`U+0CC8`), <samp>\"Sign\n       O\"</samp> (`U+0CCA`), <samp>\"Sign Oo\"</samp> (`U+0CCB`), <samp>\"Length Mark\"</samp> (`U+0CD5`),\n       and <samp>\"Ai Length Mark\"</samp> (`U+0CD6`) use `POS_AFTER_SUBJOINED` =\n       These right-side matras are ordered after all subjoined (i.e.,\n       below-base) consonant forms. \n\t   \n\t - Matras <samp>\"Sign Aa\"</samp>(`U+0CBE`), <samp>\"Sign Ii\"</samp> (`U+0CC0`), <samp>\"Sign U\"</samp>\n       (`U+0CC1`), and <samp>\"Sign Uu\"</samp> (`U+0CC2`) use\n       `POS_BEFORE_SUBJOINED` = These right-side matras are ordered before\n       all subjoined (i.e., below-base) consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_BEFORE_SUBJOINED` = Below-base matras are\n     ordered before the any subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Kannada consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Kannada Unicode block enumerates four modifier signs,\n> \"Candrabindu\" (`U+0C81`), \"Anusvara\" (`U+0C82`), \"Visarga\" \n> (`U+0C83`), and \"Avagraha\" (`U+0CBD`) In addition, Sanskrit text\n> written in Kannada may include additional signs from Vedic\n> Extensions block. \n>\n> Note also that the \"Spacing Candrabindu\" (`U+0C80`) is a letter, not\n> a modifier sign.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nKannada uses the `BASE_POS_LAST` characteristic mentioned\nearlier. However, because all consonants in the script can potentially\ntake on post-base consonant forms, the outcome of the shaping\ncharacteristic may be counterintuitive.\n\nGenerally speaking, the base consonant is the first consonant of the\nsyllable, which is rendered in full form, and any subsequent\nconsonants are rendered in special post-base forms. \n\nEach of these post-base consonants will be preceded by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by a combining\nmark-like form. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable is replaced\n    with a right-side mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp> is the only\n    consonant in the syllable). This rule is synonymous with the\n    `REPH_MODE_IMPLICIT` characteristic mentioned earlier.\n  \n<samp>\"Reph\"</samp> characters must be reordered after the syllable-identification\nstage is complete.\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Kannada script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Kannada syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nKannada. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Kannada does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n>\n> Also, it is important to note that all consonants in Kannada have a\n> post-base form, therefore the backwards-search step will\n> automatically move past them until it reaches either a <samp>\"Ra,Halant\"</samp>\n> sequence or the first consonant. However, this condition is not the\n> same as the shaping characteristic `BASE_POS_FIRST`, which does not\n> use the above search algorithm at all.\n\n> Note: Because Kannada employs the `BLWF_MODE_POST_ONLY` shaping\n> characteristic, consonants with below-base special forms will occur\n> only after the base consonant or syllable base. \n> \n> During the base-consonant search, therefore, all of these below-base\n> form sequences will be encountered and tagged correctly as\n> <samp>\"Halant,_consonant_\"</samp> patterns. Stage 2, step 5 below exists to ensure that\n> the <samp>\"_consonant_,Halant\"</samp> pattern preceding the base consonant or syllable base\n> for below-base forms in other Indic scripts will also be tagged correctly.\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their independent components. Kannada has five\nmulti-part dependent vowels, \"Ii\" (`U+0CC0`), \"Ee\" (`U+0CC7`), \"Ai\"\n(`U+0CC8`), \"O\" (`U+0CCA`), and \"Oo\" (`U+0CCB`). Each\nhas a canonical decomposition, so this step is unambiguous. \n\n> \"Ii\" (`U+0CC0`) decomposes to \"`U+0CBF`,`U+0CD5`\"\n>\n> \"Ee\" (`U+0CC7`) decomposes to \"`U+0CC6`,`U+0CD5`\"\n>\n> \"Ai\" (`U+0CC8`) decomposes to \"`U+0CC6`,`U+0CD6`\"\n>\n> \"O\" (`U+0CCA`) decomposes to \"`U+0CC6`,`U+0CC2`\"\n>\n> \"Oo\" (`U+0CCB`) decomposes to \"`U+0CCA`,`U+0CD5`\"\n>\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n> Note: The decomposition of \"Oo\" (`U+0CCB`) is atypical; Unicode\n> specifies that the codepoint decomposes to \"O\" (`U+0CCA`) followed\n> by `U+0CD5`; the \"O\" codepoint is then decomposed to\n> \"`U+0CC6`,`U+0CC2`\". Shaping engines must take care not to miss this\n> second decomposition.\n\n\n:::{figure-md}\n![Multi-part matra decomposition](/images/kannada/kannada-matra-decomposition.svg \"Multi-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-matra-decomposition}\n\nMulti-part matra decomposition\n:::\n\n```{svg-color-toggle-button} kannada-matra-decomposition\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the correct position in the syllable.\n\nLeft-side matras should be tagged with `POS_PREBASE_MATRA`.\n\nAbove-base matras should be tagged with `POS_BEFORE_SUBJOINED`.\n\nRight-side matras should be tagged according to two rules.\n\n  - Matras <samp>\"Sign Vocalic R\"</samp> (`U+0CC3`), <samp>\"Sign Vocalic Rr\"</samp>\n       (`U+0CC4`), <samp>\"Sign Ee\"</samp> (`U+0CC7`), <samp>\"Sign Ai\"</samp> (`U+0CC8`), <samp>\"Sign\n       O\"</samp> (`U+0CCA`), <samp>\"Sign Oo\"</samp> (`U+0CCB`), <samp>\"Length Mark\"</samp> (`U+0CD5`),\n       and <samp>\"Ai Length Mark\"</samp> (`U+0CD6`) should be tagged with\n       `POS_AFTER_SUBJOINED`.\n\t   \n  - Matras <samp>\"Sign Aa\"</samp>(`U+0CBE`), <samp>\"Sign Ii\"</samp> (`U+0CC0`), <samp>\"Sign U\"</samp>\n       (`U+0CC1`), and <samp>\"Sign Uu\"</samp> (`U+0CC2`) use\n       `POS_BEFORE_SUBJOINED`.\n\n> Note: the right-side matras <samp>\"Sign Ee\"</samp> (`U+0CC7`), <samp>\"Sign Ai\"</samp>\n> (`U+0CC8`), <samp>\"Sign O\"</samp> (`U+0CCA`), and <samp>\"Sign Oo\"</samp> (`U+0CCB`) are\n> multi-part matras and were decomposed into independent components\n> during stage 2, step 2. They are listed here only to ensure that the\n> two position-tagging rules used in Kannada are described completely.\n\nBelow-base matras should be tagged with `POS_BEFORE_SUBJOINED`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nKannada does not use any pre-base consonants; this step is listed here\nbecause it is part of the general processing scheme for shaping Indic scripts.\n\n> Note: Because Kannada employs the `BLWF_MODE_POST_ONLY` shaping\n> characteristic, consonants with below-base special forms will occur\n> only after the base consonant or syllable base. \n> \n> During the base-consonant search in stage 2, step 1, therefore, all of these below-base\n> form sequences will be encountered and tagged correctly as\n> <samp>\"Halant,_consonant_\"</samp> patterns. The tagging is this step ensures that\n> the <samp>\"_consonant_,Halant\"</samp> pattern preceding the base consonant or syllable base\n> for below-base forms in other Indic scripts will also be tagged correctly.\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<knd2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `pref` should match <samp>\"_Consonant_,Ra\"</samp> in post-base positions\n  - `blwf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base position\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Kannada)\n\tpref\n\tblwf \n\tabvf (not used in Kannada)\n\thalf\n\tpstf\n\tvatu (not used in Kannada)\n\tcjct\n\tcfar (not used in Kannada)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta composition](/images/kannada/kannada-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} kannada-nukt\n```\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![KSsa ligation](/images/kannada/kannada-akhn-kssa.svg \"KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #kannada-akhn-kssa}\n\nKSsa ligation\n:::\n\n```{svg-color-toggle-button} kannada-akhn-kssa\n```\n\n\n:::{figure-md}\n![JNya ligation](/images/kannada/kannada-akhn-jnya.svg \"JNya ligation\"){.shaping-demo .inline-svg .greyscale-svg #kannada-akhn-jnya}\n\nJNya ligation\n:::\n\n```{svg-color-toggle-button} kannada-akhn-jnya\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n:::{figure-md}\n![Reph composition](/images/kannada/kannada-rphf.svg \"Reph composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-rphf}\n\nReph composition\n:::\n\n```{svg-color-toggle-button} kannada-rphf\n```\n\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Kannada.\n\n#### Stage 3, step 6: pref ####\n\nThe `pref` feature replaces pre-base-reordering consonant glyphs with\nany special forms.\n\nThe substitution of the nominal glyph for its special form takes place\nat this stage. However, the actual reordering move is performed later,\nin stage 4, step 4.\n\n> Note: Kannada does not usually incorporate pre-base-reordering\n> consonant forms, but it is possible for a font to implement them in\n> order to provide for desired typographic variation.\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. All consonants in Kannada can take on a below-base consonant\nform.\n\n\n:::{figure-md}\n![Below-base form composition](/images/kannada/kannada-blwf.svg \"Below-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-blwf}\n\nBelow-base form composition\n:::\n\n```{svg-color-toggle-button} kannada-blwf\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Kannada.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n<samp>\"_Consonant_\"</samp>.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are two exceptions to the default behavior, for which the\nshaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n> Note: Kannada does not usually incorporate half forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any special forms.\n\n\n#### Stage 3, step 11: vatu ####\n\n> This feature is not used in Kannada.\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<knd2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n> Note: Kannada does not usually incorporate conjuncts, but it is\n> possible for a font to implement the `cjct` feature in order to\n> provide for desired typographic variation.\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Kannada.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\nKannada does not use pre-base matras, so this step will\ninvolve no work when processing `<knd2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Kannada incorporates the `REPH_POS_AFTER_POST`\nshaping characteristic, this final position is defined to be\nimmediately after any post-base consonant forms.\n\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Kannada incorporates the\n    > `REPH_POS_AFTER_POST` shaping characteristic, this means\n    > any positioning tag of `POS_FINAL_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_FINAL_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n:::{figure-md}\n![Reph positioning](/images/kannada/kannada-reph-position.svg \"Reph positioning\"){.shaping-demo .inline-svg .greyscale-svg #kannada-reph-position}\n\nReph positioning\n:::\n\n```{svg-color-toggle-button} kannada-reph-position\n```\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nKannada does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<knd2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nKannada does not use the `init` feature, so this step will\ninvolve no work when processing `<knd2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Kannada)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Kannada.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![Pre-base form composition](/images/kannada/kannada-pres.svg \"Pre-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-pres}\n\nPre-base form composition\n:::\n\n```{svg-color-toggle-button} kannada-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base form composition](/images/kannada/kannada-abvs.svg \"Above-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-abvs}\n\nAbove-base form composition\n:::\n\n```{svg-color-toggle-button} kannada-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually involves replacing multiple\nbelow-base glyphs (substituted earlier with the `blwf`) feature with\nligatures or conjunct forms.\n\n:::{figure-md}\n![Below-base form composition](/images/kannada/kannada-blws.svg \"Below-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-blws}\n\nBelow-base form composition\n:::\n\n```{svg-color-toggle-button} kannada-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base form composition](/images/kannada/kannada-psts.svg \"Post-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-psts}\n\nPost-base form composition\n:::\n\n```{svg-color-toggle-button} kannada-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic.\n\n:::{figure-md}\n![Halant form composition](/images/kannada/kannada-haln.svg \"Halant form composition\"){.shaping-demo .inline-svg .greyscale-svg #kannada-haln}\n\nHalant form composition\n:::\n\n```{svg-color-toggle-button} kannada-haln\n```\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Kannada shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Kannada text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![dist feature application](/images/kannada/kannada-dist.svg \"dist feature application\"){.shaping-demo .inline-svg .greyscale-svg #kannada-dist}\n\nApplication of the `dist` feature\n:::\n\n```{svg-color-toggle-button} kannada-dist\n```\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Kannada, this includes above-base dependent vowels (matras),\ndiacritical marks, and Vedic signs. \n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Kannada, this includes below-base dependent vowels\n(matras) as well as below-base diacritical marks.\n\n:::{figure-md}\n![Below-base mark positioning](/images/kannada/kannada-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #kannada-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} kannada-blwm\n```\n\n\n## The `<knda>` shaping model ##\n\nThe older Kannada script tag, `<knda>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<knda>` and some users may still have documents that were written to\ntake advantage of `<knda>` shaping.\n\n### Distinctions from `<knd2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures) was altered when migrating from `<knda>` to\n`<knd2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<knda>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<knd2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\n> Note: Uniscribe is known to make an exception to this reordering\n> operation for `<knda>` syllables that end in a <samp>\"Halant\"</samp>\n> codepoint. For example:\n>\n>     BaseC Halant Post-baseC Halant\n>\n> is _not_ reordered to <samp>\"BaseC Post-baseC Halant Halant\"</samp>. Further\n> details are provided in the [Uniscribe compatibility](notes/uniscribe-bug-compatibility.md#kannada-final-double-halants) \n> document. \n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<knda>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\nFinally, in `<knda>` text, the sequence <samp>\"Ra,Halant,ZWJ,_consonant_\"</samp>\nwas treated as equivalent to the sequence\n<samp>\"Ra,ZWJ,Halant,_consonant_\"</samp>. The current version of the Unicode\nstandard states that <samp>\"Ra,ZWJ,Halant,_consonant_\"</samp> is the correct\nsequence, which is meant to trigger the full form of <samp>\"Ra\"</samp> followed by\nthe subjoined form of <samp>\"_consonant_\"</samp>. \n\nHowever, Unicode 4.0 specified <samp>\"Ra,Halant,ZWJ,_consonant_\"</samp>\ninstead, which was inconsistent with the needs of other Indic\nscripts. Even though this sequence was changed with the release of\nUnicode 5.0, legacy documents and systems might still be encountered\nthat use the Unicode 4.0 sequence.\n\nConsequently, shaping engines that encounter a\n<samp>\"Ra,Halant,ZWJ,_consonant_\"</samp> sequence in `<knda>` text should reorder\nthe sequence to <samp>\"Ra,ZWJ,Halant,_consonant_\"</samp> or otherwise produce the\nsame behavior as <samp>\"Ra,ZWJ,Halant,_consonant_\"</samp>.\n\n\n### Advice for handling fonts with `<knda>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<knda>` shaping model.\n\n### Advice for handling text runs composed in `<knda>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<knda>` script tag and it is known that the font in use supports\nonly the `<knd2>` shaping model.\n\nShaping engines may also choose to position left-side matras according\nto the `<knda>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-khmer.md",
    "content": "```{include} /_global.md\n```\n\n# Khmer shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Khmer script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Khmer character tables](#khmer-character-tables)\n  - [The `<khmr>` shaping model](#the-khmr-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-4-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 5: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-5-applying-remaining-positioning-features-from-gpos)\n\n\n## General information ##\n\nThe Khmer or Cambodian script is a descendant of the Brahmi script,\nand follows many of the same general patterns found in [Indic\nscripts](opentype-shaping-indic-general.md). However, Khmer\nincorporates enough distinctions of its own that it is generally not\nadvisable to attempt supporting it in a general-purpose Indic shaping\nengine. \n\nThe Khmer script is used to write multiple languages, most commonly\nKhmer, Tampuan, Krung, Cham, and Pali. In addition,\nSanskrit may be written in Khmer, but the Khmer script is not used\nfor Vedic texts, therefore Khmer text runs are not expected to\ninclude any glyphs from the Vedic Extensions block of Unicode. \n\nThe Khmer script tag defined in OpenType is `<khmr>`.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Brahmi-derived and\nIndic scripts.  The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. Syllables\nin Khmer script can include sequences of multiple vowels and,\ntherefore, multiple matras.\n\n**Halant** and **Virama** are both standard terms for the\nvowel-killer\" mark. Unicode documents use the term \"virama\" most \nfrequently, while OpenType documents use the term \"halant\" most\nfrequently.\n\nThe Khmer block does include a version of \"halant\" mark, \"Viriam\"\n(`U+17D1`). Its usage in Khmer text, however, differs significantly\nfrom the usage of \"halant\" in other Brahmi-derived and Indic\northographies.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the\ndiacritical mark indicating that the preceding vowel should be\nnasalized. In Khmer, the chandrabindu mark is known as the \"nikahit\".\n\nThe term **base consonant** is also critical to Khmer shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra). \n\nEach consonant in Khmer bears one of two inherent vowels. The\ntwo sets of letters that correspond to these inherent vowels are\nreferred to as **registers**. In Khmer text, a **register shifter**\nmark can be used to replace the letter's inherent vowel with the\ninherent vowel of the other register.\n\nSome consonants in one register have a corresponding consonant in the\nother register; for these consonant pairs, a register shifter is not\nemployed. \n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. The **below-base**,\nsubscript forms of letters are the most frequently seen secondary \nforms.  However, some secondary forms are **post-base**, and the\nsecondary form of \"Ro\" is a **pre-base**-reordering form.\n\nThese secondary letter forms are known in Khmer as **coeng**\nforms. Most coengs are consonants, although, in certain cases,\nindependent vowels may take on coeng forms. \n\nThe Khmer block of Unicode does not encode the coeng forms of\nletters as separate codepoints. Instead, the \"Sign Coeng\" (`U+17D2`)\ncodepoint is a control character used to indicate that the following\nletter should be rendered in its coeng form. The \"Sign Coeng\" has no\nvisual representation of its own. \n\n> Note: Despite the potentially confusing name in the Unicode\n> standard, \"Sign Coeng\" (`U+17D2`) itself should _not_ be referred to\n> as a **coeng**. The term \"coeng\" refers to the form of the letter\n> that follows the \"Sign Coeng\" control character.\n\n:::{figure-md}\n![Coeng form of Kha](images/khmer/khmer-coeng-kha.svg \"Coeng form of Kha\"){.shaping-demo .inline-svg .greyscale-svg #khmer-coeng-kha}\n\nCoeng form of Kha\n:::\n\n```{svg-color-toggle-button} khmer-coeng-kha\n```\n\n\nAlthough coengs are typically attached to the base consonant of a\nsyllable, in certain circumstances coengs may also be attached to an\nindependent vowel. \n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n\n## Glyph classification ##\n\nShaping Khmer text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nKhmer glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Khmer script. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`NUKTA` and `VISARGA`. Some less-common marks behave according to\nrules that are similar to these common marks, and are therefore\nclassified with the corresponding common mark. \n\n> Note: In Khmer, the `NUKTA` class is used for the chandrabindu mark,\n> <samp>\"Nikahit\"</samp> (U+17C6). This more correctly reflects the shaping\n> behavior of the nikahit mark than does the `BINDU` class used in\n> other scripts.\n\n<samp>\"Viriam\"</samp> (`U+17D1`), Khmer's \"halant\"-like codepoint, is classified as\n`PURE_KILLER` rather than the more common `VIRAMA`. This is to\nindicate that the <samp>\"Viriam\"</samp> suppresses the inherent vowel of a\nconsonant but is not used between consonants to trigger the formation\nof a subjoined form.\n\n<samp>\"Sign Coeng\"</samp>, the coeng-form generator, is classified as\n`INVISIBLE_STACKER`. This is to indicate that the <samp>\"Sign Coeng\"</samp>\ncodepoint itself is never rendered as a visible glyph. \n\n<samp>\"Toandakhiat\"</samp> (`U+17CD`) is classified as `CONSONANT_KILLER`. This\nmark indicates that the previous consonant is not pronounced. Note\nthat <samp>\"Toandakhiat\"</samp> is a diacritic mark, and that its class,\n`CONSONANT_KILLER` is not a subclass of consonant.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nKhmer uses one subclass of consonant, `CONSONANT_POST_REPHA`. This\nsubclass is used only for <samp>\"Robat\"</samp>, the above-base form of <samp>\"Ro\"</samp>. The\n<samp>\"Robat\"</samp> is similar to the <samp>\"Reph\"</samp> found in many Indic scripts but,\nunlike <samp>\"Reph\"</samp>, <samp>\"Robat\"</samp> is encoded as a separate codepoint; therefore, it is\nnot formed by a special sequence of control characters.\n\n:::{figure-md}\n![Robat](images/khmer/khmer-robat.svg \"Robat\"){.shaping-demo .inline-svg .greyscale-svg #khmer-robat}\n\nRobat\n:::\n\n```{svg-color-toggle-button} khmer-robat\n```\n\n\n<samp>\"Robat\"</samp> is a consonant, but it is classified as a combining mark in\nUnicode. For shaping purposes, <samp>\"Robat\"</samp> behaves like the <samp>\"Nukta\"</samp> mark\nfound in many Indic scripts.\n\nThe Khmer glottal-stop consonant \"Qa\" (`U+17A2`) carries an inherent\nvowel and is also capable of accepting dependent vowels (matras). It\nis sometimes used in place of an independent vowel. For shaping\npurposes, however, this usage of \"Qa\" does not demand any special\ntreatment.\n\nOther characters, such as symbols and punctuation, need no special\nattention from the shaping engine, so they are not assigned a shaping\nclass.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the base consonant to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the base consonant.\n  - `RIGHT_POSITION` matras are positioned to the right of the base consonant.\n  - `TOP_POSITION` matras are positioned above the base consonant.\n  - `BOTTOM_POSITION` matras are positioned below base consonant.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nMulti-part dependent vowels (matras) may be designated with compound\nmark-placement subclasses (such as `TOP_AND_LEFT_POSITION`) that\ndenote all of the mark-placement positions occupied.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n\n### Khmer character tables ###\n\nThe Khmer block in Unicode includes all of the codepoints necessary to\nwrite Khmer language text. The Khmer Symbols block contains\nmiscellaneous symbols used for lunar-date calendars. The Khmer Symbols\ncodepoints do not evoke any special behavior from the shaping engine.\n\nSeparate character tables are provided for the Khmer and Khmer Symbols\nblocks as well as for other miscellaneous characters that \nare used in `<khmr>` text runs:\n\n  - [Khmer character table](character-tables/character-tables-khmer.md#khmer-character-table)\n  - [Khmer Symbols character table](character-tables/character-tables-khmer.md#khmer-symbols-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-khmer.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1780`   | Letter           | CONSONANT         | _null_                     | &#x1780; Ka                  |\n| | | | | |\n|`U+17C6`   | Mark [Mn]        | NUKTA             | TOP_POSITION               | &#x17C6; Nikahit             |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\nOther important characters that may be encountered when shaping runs\nof Khmer text include the dotted-circle placeholder (`U+25CC`), \nthe no-break space (`U+00A0`), and the zero-width space (`U+200B`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n<!--- The zero-width joiner is primarily used to prevent the formation of a\nsubjoining form from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. The sequence\n<samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the substitution of a\nsubjoined form for the second consonant. --->\n\n<!---\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\nwhere an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width joiner\notherwise would.\n--->\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Sign_Coeng,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or\n<samp>\"NBSP,_matra_\"</samp>.\n\nThe zero-width space may be used between words — even though no visual\nword spacing results — in order to indicate word breaks within a text\nthat can be used by line-breaking algorithms in a hgher-level\ntypesetting environment.\n\nSeveral codepoints in the Khmer block are deprecated and their usage\nin new documents is officially discouraged. The deprecated codepoints\nare:\n\n  - \"Qaq\" (`U+17A3`)\n  - \"Qaa\" (`U+17A4`)\n  \nUsage of three other codepoints is also discouraged, although the\ncodepoints have not been deprecated. These codepoints are:\n\n  - \"Inherent Aq\" (`U+17B4`)\n  - \"Inherent Aaa\" (`U+17B5`)\n  - \"Sign Beyyal\" (`U+17D8`)\n\nAlthough usage of these codepoints in text is discouraged, shaping\nengines encountering them in a text run should handle the situation\ngracefully.\n\n\n## The `<khmr>` shaping model ##\n\nProcessing a run of `<khmr>` text involves five top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Applying all remaining substitution features from <abbr>GSUB</abbr>\n5. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Brahmi-derived and Indic scripts, the initial reordering\nstage involves applying a set of several script-specific rules. The\nbasic substitution features must be applied to the run in a specific\norder. The remaining substitution features in stage four, however, do\nnot have a mandatory order.\n\nKhmer exhibits many of the same shaping patterns found in Indic\nscripts, but it differs in a few critical characteristics. With regard\nto these common variations, Khmer's specific shaping \ncharacteristics include:\n\n\n  - The first consonant of a syllable is always the base consonant.\n\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `BASE_POS_FIRST`.\n  \n  - The <samp>\"Robat\"</samp> form, which is analogous to <samp>\"Reph\"</samp> or <samp>\"Repha\"</samp> in Indic\n    scripts, in separately encoded as a non-spacing mark\n    codepoint. The <samp>\"Robat\"</samp> form does not require reordering.\n\n> Note: For comparison with the General Indic shaping model, the Robat\n> -encoding characteristic would correspond to `REPH_MODE_VISUAL_REPHA`,\n> and the reordering characteristic would be _null_ or some other\n> designation indicating that the <samp>\"Robat\"</samp> is not reordered. \n<!---Because \n> <samp>\"Robat\"</samp> is typically a syllable-initial feature, shaping engines may\n> also choose to --->\n  \n  - The below-base forms feature is applied to consonants\n    after the base consonant. \n\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `BLWF_MODE_POST_ONLY`.\n\n  - The ordering position for left-side matras, as with Indic scripts,\n    is `POS_PREBASE_MATRA`.\n\n  - The ordering positions for right-side, below-base, and above-base matras is the\n    same. All are reordered to immediately after the last post-base consonant.\n   \t\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `MATRA_POS_TOP`,\n> `MATRA_POS_BOTTOM`, and `MATRA_POS_RIGHT` taking the ordering position \n> `POS_AFTER_POST`.\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Khmer consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Khmer Unicode block enumerates eight modifier signs,\n> \"Nikahit\" (`U+17C6`), \"Reahmuk\" (`U+17C7`), \"Bantoc\" (`U+17CB`),\n> \"Kakabat\" (`U+17CE`),  \"Ahsda\" (`U+17CF`), \"Samyok Sannya\"\n> (`U+17D0`), \"Bathamasat\" (`U+17D3`), and \"Atthacan\" (`U+17DD`). \n\nBecause texts written in Khmer script do not generally employ\ninter-word spaces, however, shaping engines must rely on\nsyllable-identification algorithms to recognize word-boundary\npatterns — distinguishing numeric sequences, symbols, punctuation, and other\nmiscellaneous script characters from syllables within words.\n\nValid syllables may begin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that vowel is the\nsyllable's only vowel sound and, by definition, there is no \"base\"\nconsonant. \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the consonant's inherent vowel sound. This vowel sound can be changed\n> by a dependent vowel (matra) sign or by a register shifter following the consonant.\n\nFor a syllable beginning with a consonant, the base consonant is the\nfirst consonant of the syllable.\n\nUnlike Indic scripts, where the vowel sound designates the end of the\nsyllable, Khmer syllables can end with final consonants that occur\nafter a dependent vowel (matra).\n\nAll post-base consonants in a valid syllable will be preceded by <samp>\"Sign Coeng\"</samp>\nmarks. This includes final consonants.\n\n\tBaseC Sign-Coeng Post-baseC Matra Sign-Coeng FinalC\n\t\nIn some Khmer words, an independent vowel can occur in a subjoined\nposition like a post-base consonant. In such instances, the\nindependent vowel will be preceded by <samp>\"Sign Coeng\"</samp>.\n\n\tBaseC Sign-Coeng IndependentVowel\n\nThe algorithms for identifying syllables and for correctly identifying\nthe base consonant include test to recognize these sequences.\n\n\nAs with other Brahmi-derived and Indic scripts, the consonant <samp>\"Ro\"</samp> receives\nspecial treatment. \n\n  - A post-base <samp>\"Ro\"</samp> must be reordered to a visually pre-base\n    position. This move is performed during the initial reordering\n    stage.\n  - <samp>\"Robat\"</samp>, the above-base variant of <samp>\"Ro\"</samp>, is encoded as a combining\n    mark rather than as a full consonant. <samp>\"Robat\"</samp> does not, however,\n    require reordering by the shaping engine.\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Khmer script, may\n> not adhere to the syllable-formation rules described above. \n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following regular expressions can be used to match Khmer-script\nsyllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ro\" \n_consonant_\t= `CONSONANT` - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA` | `CONSONANT_POST_REPHA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER` | `CONSONANT_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_registershifter_ = `REGISTER_SHIFTER`\n_coeng_\t\t= `INVISIBLE_STACKER`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n```\n\n> Note: The `CONSONANT_POST_REPHA` shaping class is merged with the\n> `NUKTA` shaping class to reflect the correct orthographic behavior\n> of <samp>\"Robat\"</samp>.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_ | _vowel_\nN\t= (_zwnj_? _registershifter_)? (_nukta_ _nukta_?)?\nZ\t= _zwj_ | _zwnj_\nCN\t= C N?\nMATRA_GROUP\t= Z? _matra_ N?\nSYLLABLE_TAIL\t= (_syllablemodifier_ _syllablemodifier_?)?\nPARTIAL_CLUSTER\t= N? (_coeng_ CN)* MATRA_GROUP* (_coeng_ CN)? SYLLABLE_TAIL\n```\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA valid syllable will match the expression:\n```markdown\n(C | _placeholder_ | _dottedcircle_) PARTIAL_CLUSTER\n```\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\nA sequence that does not match any of these expressions should be\nregarded as broken. The shaping engine may make a best-effort attempt\nto shape the broken sequence, but making guarantees about the\ncorrectness or appearance of the final result is out of scope for this\ndocument.\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs\n> and pre-base-reordering <samp>\"Ro\"</samp>. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_BASE_CONSONANT\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD \n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered in Khmer script. \n\nThe position names mimic those used in the General Indic shaping\nmodel, for ease of implementation. However, shaping engines are free\nto use any naming scheme they choose. It includes some categories not\nutilized in Khmer.\n\nThe basic positions (left to right) are <samp>\"Reph\"</samp> (`POS_RA_TO_BECOME_REPH`), dependent\nvowels (matras) and consonants positioned before the base\nconsonant (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base\nconsonant (`POS_BASE_CONSONANT`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant\n(`POS_POSTBASE_CONSONANT`), syllable-final consonants (`POS_FINAL_CONSONANT`),\nand syllable-modifying or Vedic signs (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the base consonant. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which follows.\n\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_BASE_CONSONANT`.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\nVowel-based syllables, standalone sequences, and broken text runs will\nnot have base consonants.\n\nThe algorithm for determining the base consonant is\n\n  - Starting from the beginning of the syllable, move forwards until a\n    `CONSONANT` is found. \n  - The consonant stopped at will be the base consonant.\n\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their left-side and right-side components. Khmer has five\ntwo-part dependent vowels, \"Oe\" (`U+17BE`), \"Ya\" (`U+17BF`), \"Ie\"\n(`U+17C0`), \"Oo\" (`U+17C4`), and \"Au\" (`U+17C5`).\n\nEach of these dependent vowels decomposes into a left-side component\nidentical to the single-part dependent vowel \"E\" (`U+17C1`) plus a\nright-side component.\n\nUnlike many other scripts, the decompositions of multi-part dependent\nvowels in Khmer are not defined as canonical in Unicode. Some of the\nright-side components that would result from these decompositions do not\ncorrespond to assigned Unicode codepoints.\n\nInstead, fonts often substitute the default glyph with a\nright-side-component glyph using <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions. The decomposition\nstep performed here allows the left-side component to be correctly\nreordered by the shaping engine.\n\n> \"Oe\" (`U+17BE`) decomposes to \"`U+17C1`,`U+17BE`\"\n>\n> \"Ya\" (`U+17BF`) decomposes to \"`U+17C1`,`U+17BF`\"\n>\n> \"Ie\" (`U+17C0`) decomposes to \"`U+17C1`,`U+17C0`\"\n>\n> \"Oo\" (`U+17C4`) decomposes to \"`U+17C1`,`U+17C4`\"\n>\n> \"Au\" (`U+17C5`) decomposes to \"`U+17C1`,`U+17C5`\"\n\nTwo of the multi-part dependent vowels, \"Oe\" (`U+17BE`) and \"Oo\"\n(`U+17C4`), can be decomposed into existing Unicode codepoints. If\ndesired, the corresponding decompositions are: \n\n> \"Oe\" (`U+17BE`) decomposes to \"`U+17C1`,`U+17B8`\"\n>\n> \"Oo\" (`U+17C4`) decomposes to \"`U+17C1`,`U+17B6`\"\n\nHowever, shaping engines should take note of the fact that these\ndecompositions are non-canonical and therefore, if the active font's\ndesign employs non-standard stylistic choices, the results may not\nappear as expected.\n\n:::{figure-md}\n![Multi-part matra decomposition](images/khmer/khmer-matra-decomposition.svg \"Multi-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #khmer-matra-decomposition}\n\nMulti-part matra decomposition\n:::\n\n```{svg-color-toggle-button} khmer-matra-decomposition\n```\n\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll right-side, above-base, and below-base dependent-vowel (matra)\nsigns are tagged `POS_AFTER_POST`.\n\nFor simplicity, shaping engines may choose to tag matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all matras correctly tagged\nbefore proceeding to the next step. \n\n\n<!--- #### Stage 2, step 4: Adjacent marks #### --->\n<!--- This does not seem to happen in Khmer. --->\n<!--- Commenting out & renumbering. --->\n\n\n#### Stage 2, step 4: Pre-base-reordering consonants ####\n\nFourth, all pre-base-reordering consonants must be tagged with\n`POS_PREBASE_CONSONANT`. \n\nKhmer has one pre-base-reordering consonant: <samp>\"Ro\"</samp>.\n\n:::{figure-md}\n![Pre-base-reordering Ro](images/khmer/khmer-pref.svg \"Pre-base-reordering Ro\"){.shaping-demo .inline-svg .greyscale-svg #khmer-pref}\n\nPre-base-reordering Ro\n:::\n\n```{svg-color-toggle-button} khmer-pref\n```\n\n\n\n#### Stage 2, step 5: Tag remaining consonants ####\n\nFifth, the remaining consonants and independent vowels should be\ntagged with the appropriate positions.\n\n  - All `VOWEL_INDEPENDENT`s and all `CONSONANT`s other than <samp>\"Ro\"</samp> occurring\n    after the base consonant (found in step one) and must be tagged as \n    `POS_BELOWBASE_CONSONANT`. \n  - A `CONSONANT`s or `VOWEL_INDEPENDENT`s in the syllable occurring\n    after a dependent vowel (matra) must be tagged as `POS_FINAL_CONSONANT`.\n\nIn a valid syllable, such post-base consonants (of class `CONSONANT`)\nand independent vowels (of class `VOWEL_INDEPENDENT`) will be preceded by a\n<samp>\"Sign_Coeng\"</samp> glyph. \n\n> Note: The consonant <samp>\"Robat\"</samp>, of class `CONSONANT_POST_REPHA`, is not\n> included in the classes checked here and must not be tagged in this\n> step. <samp>\"Robat\"</samp> should not appear in a post-base position in a valid\n> syllable.\n\n\n\n#### Stage 2, step 6: Mark tagging ####\n\n<!--- not sure this is done!!! --->\n\nSixth, all marks must be tagged. \n\nSeveral Khmer marks that are categorized in Unicode as syllable\nmodifiers or that modify consonants are allowed to occur mid-syllable\nin Khmer words. Therefore, they are not tagged for the `POS_SMVD`\nposition that is typically reserved for syllable modifiers and Vedic\nsigns.\n\n:::{table} {{khmer_midsyllable_mark_table_workaround}}\n\n| Codepoint | Sorting Position        | Glyph                  |\n|:----------|:------------------------|:-----------------------|\n|`U+17CB`   |`POS_ABOVEBASE_CONSONANT`| &#x17CB; Bantoc        |\n|`U+17CD`   |`POS_ABOVEBASE_CONSONANT`| &#x17CD; Toandakhiat   |\n|`U+17CE`   |`POS_ABOVEBASE_CONSONANT`| &#x17CE; Kakabat       |\n|`U+17CF`   |`POS_ABOVEBASE_CONSONANT`| &#x17CF; Ahsda         |\n|`U+17D0`   |`POS_ABOVEBASE_CONSONANT`| &#x17D0; Samyok Sannya |\n|`U+17D1`   |`POS_ABOVEBASE_CONSONANT`| &#x17D1; Viriam        |\n|`U+17D3`   |`POS_ABOVEBASE_CONSONANT`| &#x17D3; Bathamasat    |\n|`U+17DD`   |`POS_ABOVEBASE_CONSONANT`| &#x17DD; Atthacan      |\n:::\n\n\nAll remaining marks, including <samp>\"Sign Coeng\"</samp>, must be tagged with the\nsame positioning tag as the closest non-mark character the mark has\naffinity with, so that they move together during the sorting step.\n\nThere are two possible cases: those marks before the base consonant\nand those marks after the base consonant.\n\n  1. Initially, all remaining marks should be tagged with the same\n  positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the base consonant (such as post-base\n  consonants, below-base consonants, or final consonants), all\n  remaining marks located between that current consonant and any\n  previous consonant should be tagged with the same positioning tag as\n  the current (later) consonant.\n  \nIn other words, all consonants preceding the base consonant \"own\" the\nmarks that follow them, while all consonants after the base consonant\n\"own\" the marks that come before them. When a syllable does not have\nany consonants after the base consonant, the base consonant should\n\"own\" all the marks that follow it.\n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n>\n> Note: No marks will precede the base consonant in a valid syllable.\n\nWith these steps completed, the syllable can be sorted into the final sort order.\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed:\n\n\tlocl\n\tccmp \n\tpref \n\tblwf\n\tabvf\n\tpstf\n\tcfar\n\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n<!--- :::{figure-md}\n![Local forms substitution](images/khmer/khmer-locl.svg \"Local forms substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-locl}\n\nLocal forms substitution\n::: --->\n\n\n\n#### Stage 3, step 2: ccmp ####\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of glyphs. \n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n> Note: `ccmp` usage is uncommon in Khmer fonts. Nevertheless,\n> shaping engines must apply any `ccmp` substitutions if they are\n> present in the active font.\n\n\n#### Stage 3, step 3: pref ####\n\nThe `pref` feature replaces pre-base-consonant glyphs with\nany special forms. In Khmer, this typically includes the\npre-base-reordering form of <samp>\"Ro\"</samp>.\n\n:::{figure-md}\n![Pre-base form substitution](/images/khmer/khmer-pref-1.svg \"Pre-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-pref-1}\n\nPre-base form substitution\n:::\n\n```{svg-color-toggle-button} khmer-pref-1\n```\n\n\n<!--- be sure to show initial form with Ro BEFORE the base consonant, --->\n<!--- since initial reordering has been done already. --->\n\n\n#### Stage 3, step 4: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. In Khmer, this usually means replacing the default\nforms of letters with coeng (or subscript) forms.\n\n\n<!--- Check below!  --->\n\nThe below-base forms feature is applied to glyphs occurring after the\nbase consonant.\n\n:::{figure-md}\n![Below-base form substitution](/images/khmer/khmer-blwf.svg \"Below-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-blwf}\n\nBelow-base form substitution\n:::\n\n```{svg-color-toggle-button} khmer-blwf\n```\n\n\n\n#### Stage 3, step 5: abvf ####\n\nThe `abvf` feature replaces above-base-consonant glyphs with any\nspecial forms. In Khmer, this may include variant forms of above-base\ndependent vowels and marks.\n<!--- single-sub-lookup 25, 28 --->\n\n:::{figure-md}\n![Above-base form substitution](images/khmer/khmer-abvf.svg \"Above-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-abvf}\n\nAbove-base form substitution\n:::\n\n```{svg-color-toggle-button} khmer-abvf\n```\n\n\n\n#### Stage 3, step 6: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any\nspecial forms. In Khmer, this can include coeng forms of certain\nconsonants that include an ascending \"arm\" on the right-hand side as\nwell as variant forms for right-side matras and marks.\n\n:::{figure-md}\n![Post-base form substitution](/images/khmer/khmer-pstf.svg \"Post-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-pstf}\n\nPost-base form substitution\n:::\n\n```{svg-color-toggle-button} khmer-pstf\n```\n\n\n\n#### Stage 3, step 7: cfar ####\n\nThe `cfar` feature replaces any below-base-consonant or\npost-base-consonant glyphs that occur immediately after a <samp>\"Sign\nCoeng,Ro\"</samp> sequence with special presentation forms. This can include\ncontextual variants of post-base and below-base glyphs designed to\nbetter interact, visually, with the final position of\npre-base-reordering <samp>\"Ro\"</samp>.\n\n<!--- Try TRYo--->\n\n<!--- ### 4: Final reordering ### --->\n<!--- Is there any? --->\n\n### Stage 4. Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. The order in which these features are applied is not\ncanonical; they should be applied in the order in which they appear in\nthe <abbr title=\"Glyph Substitution table\">GSUB</abbr> table in the font. \n\n\tpres\n\tblws\n\tabvs\n\tpsts\n\tcalt\n\tclig\n\tliga\n\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. In Khmer, this can include stylistic variants\nof left-side dependent vowels (matras) or of pre-base-reordering <samp>\"Ro\"</samp>. \n\n:::{figure-md}\n![Pre-base presentation form substitution](/images/khmer/khmer-pres.svg \"Pre-base presentation form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-pres}\n\nPre-base presentation form substitution\n:::\n\n```{svg-color-toggle-button} khmer-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base presentation form substitution](/images/khmer/khmer-abvs.svg \"Above-base presentation form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-abvs}\n\nAbove-base presentation form substitution\n:::\n\n```{svg-color-toggle-button} khmer-abvs\n```\n\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. In Khmer, this can include contextual ligatures\ninvolving below-base dependent vowel marks (matras) or subjoined letters.\n\n:::{figure-md}\n![Below-base presentation form substitution](/images/khmer/khmer-blws.svg \"Below-base presentation form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-blws}\n\nBelow-base presentation form substitution\n:::\n\n```{svg-color-toggle-button} khmer-blws\n```\n\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes stylistic variants of\nright-side dependent vowels (matras) or of subjoined letters featuring\nright-side ascenders.\n\n\n:::{figure-md}\n![Post-base presentation form substitution](/images/khmer/khmer-psts.svg \"Post-base presentation form substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-psts}\n\nPost-base presentation form substitution\n:::\n\n```{svg-color-toggle-button} khmer-psts\n```\n\n\nThe `clig` feature substitutes optional ligatures that are on by\ndefault, but which are activated only in certain contexts. \n\n> Note: In some other scripts, substitutions made by `clig` may be\n> disabled by application-level user interfaces. For Khmer, however,\n> application of `clig` substitutions in mandatory because these\n> substitutions are important for typographic correctness, not merely\n> for user preference.\n\n:::{figure-md}\n![Contextual ligature substitution](images/khmer/khmer-clig.svg \"Contextual ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-clig}\n\nContextual ligature substitution\n:::\n\n```{svg-color-toggle-button} khmer-clig\n```\n\n\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Standard ligature substitution](/images/khmer/khmer-liga.svg \"Standard ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #khmer-liga}\n\nStandard ligature substitution\n:::\n\n```{svg-color-toggle-button} khmer-liga\n```\n\n\n### Stage 5: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied. As with the preceding stage, the order in which these\nfeatures are applied is not canonical; they should be applied in the\norder in which they appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n\tdist\n\tkern\n\tblwm\n\tabvm\n\tmkmk\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` is not mandatory for shaping\n> Khmer text and may be disabled by user preference.\n>\n> Notably, the Microsoft Uniscribe shaping engine does not apply\n> `kern` lookups even if they are present in the font. For more\n> information on Uniscribe compatibility, see the\n> [Uniscribe-bug-compatibility note](/notes/uniscribe-bug-compatibility.md).\n\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Application of the dist feature](/images/khmer/khmer-dist.svg \"Application of the dist feature\"){.shaping-demo .inline-svg .greyscale-svg #khmer-dist}\n\nApplication of the dist feature\n:::\n\n```{svg-color-toggle-button} khmer-dist\n```\n\n\nThe `abvm` feature positions above-base glyphs for attachment to base\ncharacters. In Khmer, this includes register shifters and syllable\nmodifiers, in addition to diacritical marks and above-base dependent\nvowels (matras).\n\n:::{figure-md}\n![Above-base mark positioning](/images/khmer/khmer-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #khmer-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} khmer-abvm\n```\n\n\nThe `blwm` feature positions below-base glyphs for attachment to base\ncharacters. In Khmer, this can include coeng forms of letters as well as\nbelow-base dependent vowels (matras).\n\n:::{figure-md}\n![Below-base mark positioning](/images/khmer/khmer-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #khmer-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} khmer-blwm\n```\n\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n"
  },
  {
    "path": "opentype-shaping-malayalam.md",
    "content": "```{include} /_global.md\n```\n\n# Malayalam shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Malayalam script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Malayalam character tables](#malayalam-character-tables)\n  - [The `<mlm2>` shaping model](#the-mlm2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<mlym>` shaping model](#the-mlym-shaping-model)\n      - [Distinctions from `<mlm2>`](#distinctions-from-mlm2)\n      - [Advice for handling fonts with `<mlym>` features only](#advice-for-handling-fonts-with-mlym-features-only)\n      - [Advice for handling text runs composed in `<mlym>` format](#advice-for-handling-text-runs-composed-in-mlym-format)\n\n\n## General information ##\n\nThe Malayalam script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the South Indic subgroup.\n\nThe Malayalam script is used to write multiple languages, most commonly\nMalayalam and Paniya. In addition, Sanskrit may be written\nin Malayalam, so Malayalam script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Malayalam script tags defined in OpenType, `<mlym>`\nand `<mlm2>`. The older script tag, `<mlym>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<mlm2>`\nshaping model. However, if a font is encountered that supports only\n`<mlym>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\n**Halant** and **Virama** are both standard terms for the \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Malayalam\nlanguage, this sign is known as the _chandrakkala_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Malayalam\nlanguage, this mark is known as the _candrabindu_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. The **Reph** form of the consonant \"Ra\" is an\nexample (post-base in traditional orthography and pre-base in\nreformed orthography). Some of these substitutions create **above-base**\nor **below-base** forms. For instance \"La\" takes a `below-base` form.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Malayalam text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nMalayalam glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nMalayalam uses two subclasses of consonant, `CONSONANT_DEAD` and\n`CONSONANT_PRE_REPHA`. \n\nThe `CONSONANT_DEAD` subclass is used for the Malayalam _chillu_\nvariants of certain consonants. It indicates that the characters\nshould match tests for consonants, such as when [identifying \nsyllables](#stage-1-identifying-syllables-and-other-sequences), but that, unlike\nstandard consonants, they carry no inherent vowel. The lack of an\ninherent vowel is important during the [initial\nreordering](#stage-2-initial-reordering) stage.\n\nThe `CONSONANT_PRE_REPHA` subclass is used only for the \"Dot Reph\"\n(`U+0D4E`), a dead-consonant version of \"Reph\" (or \"Repha\"). In modern\nMalayalam orthography, \"Dot Reph\" is uncommon. As with\n`CONSONANT_DEAD`, this subclass should match tests for\nconsonants. Because the \"Dot Reph\" character is a \"Reph\", however, it\nmust be treated as a \"Reph\" during the initial and final reordering stages.\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\nMalayalam includes two special marks that are classified as\n`PURE_KILLER`, \"Vertical Bar Virama\" (`U+0D3B`) and \"Circular Virama\"\n(`U+0D3C`). These marks, like the Virama or \"Halant\", suppress the\ninherent vowel of a consonant. However, unlike \"Halant\", the use of a\n`PURE_KILLER` prevents the formation of ligatures and conjuncts, and\nthe mark itself is always rendered explicitly. \n\nConsequently, these marks behave like dependent-vowel marks\n(matras). Shaping engines may choose to treat them as matras for simplicity.\n\n### Malayalam character tables ###\n\nSeparate character tables are provided for the Malayalam and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<mlm2>` text runs:\n\n  - [Malayalam character table](character-tables/character-tables-malayalam.md#malayalam-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-malayalam.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-malayalam.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0D01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0D01; Candrabindu         |\n| | | | |\n|`U+0D15`   | Letter           | CONSONANT         | _null_                     | &#x0D15; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Malayalam text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n    \n> Note: Malayalam differs from many Indic scripts in that <samp>\"Reph\"</samp>\n> usage is rare in the modern orthography. In word-initial positions, a\n> <samp>\"Ra,Halant\"</samp> sequence is typically replaced by a dead-consonant form,\n> <samp>\"Chillu R\"</samp>.\n\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>. \n\nIn addition to general punctuation, runs of Malayalam text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<mlm2>` shaping model ##\n\nProcessing a run of `<mlm2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Malayalam's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_AFTER_MAIN` = <samp>\"Reph\"</samp> is ordered after the syllable base.\n\n  - `REPH_MODE_LOGICAL_REPHA` = <samp>\"Reph\"</samp> is encoded as its own Unicode\n     codepoint (<samp>\"Repha\"</samp>), but it must still be reordered. \n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = _null_  = Unlike most other Indic scripts, Malayalam\n     does not use any above-base matras. Therefore, this shaping\n     characteristic does not apply.\n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are\n     ordered after all post-base consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n> Note: Unlike most other Indic scripts, Malayalam does not use\n> above-base matras. Therefore `MATRA_POS_TOP` can be set to _null_.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Malayalam consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Malayalam Unicode block enumerates five modifier signs,\n> \"Combining Anusvara Above\" (`U+0D00`), \"Candrabindu\" (`U+0D01`),\n> \"Anusvara\" (`U+0D02`), \"Visarga\" (`U+0D03`), and \"Avagraha\"\n> (`U+0D3D`). In addition, Sanskrit text written in Malayalam may \n> include additional signs from Vedic Extensions block.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nNon-base consonants in a valid syllable will be separated by <samp>\"Halant\"</samp>\nmarks. Pre-base consonants will be followed by <samp>\"Halant\"</samp>, while\npost-base consonants will be preceded by <samp>\"Halant\"</samp>.\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\t\nThe algorithm for correctly identifying the base consonant includes a\ntest to recognize these sequences and not mis-identify the base\nconsonant.\n\nAll consonants in Malayalam can potentially occur in pre-base\nposition. The <samp>\"Halant\"</samp> marks on pre-base consonants indicate that they\ncarry no vowel. Instead, they affect syllable pronunciation by\ncombining with the base consonant (e.g., \"_thr_\" or \"_spl_\").\n\nThree consonants in Malayalam are allowed to occur in post-base\nposition: <samp>\"Ya\"</samp>, <samp>\"Va\"</samp>, and <samp>\"Ra\"</samp>. The post-base <samp>\"Ra\"</samp> is reordered to\nbefore the base consonant or syllable base during the final-reordering stage of the\nshaping process. The post-base forms of <samp>\"Ya\"</samp> and <samp>\"Va\"</samp>\nremain in post-base position.\n\nMalayalam also includes one consonant that can take on a below-base\nform, <samp>\"La\"</samp>.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment. Malayalam differs from many Indic scripts in that <samp>\"Reph\"</samp>\nusage is rare in the modern orthography.\n\nIn word-initial positions, a <samp>\"Ra,Halant\"</samp> sequence is typically\nreplaced by a dead-consonant form, <samp>\"Chillu R\"</samp>. \n\nMalayalam text runs may also include the explicit variant of <samp>\"Reph\"</samp>,\nthe <samp>\"Dot Reph\"</samp> (`U+0D4E`), also known as <samp>\"Repha\"</samp>. \n\n> Note: Modern Malayalam orthography prefers using the <samp>\"Chillu R\"</samp>\n> instead of <samp>\"Reph\"</samp>. Therefore, Malayalam fonts may omit\n> implementation of the <samp>\"Reph\"</samp> substitution entirely.\n\nAs is the case with <samp>\"Reph\"</samp>, <samp>\"Repha\"</samp> characters must be reordered after the\nsyllable-identification stage is complete. This is the\n`REPH_MODE_LOGICAL_REPHA` shaping characteristic.\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Malayalam script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Malayalam syllables.\n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Repha\"</samp> glyphs, and other consonants that take special\n> treatment in some circumstances. <samp>\"Ra\"</samp>, <samp>\"Va\"</samp>, <samp>\"La\"</samp>, and <samp>\"Ya\"</samp> occasionally\n> take on special forms, depending on their position in the syllable.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nMalayalam. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp> (`POS_RA_TO_BECOME_REPH`), dependent\nvowels (matras) and consonants positioned before the base\nconsonant or syllable base (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base\nconsonant or syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant or syllable base\n(`POS_POSTBASE_CONSONANT`), syllable-final consonants (`POS_FINAL_CONSONANT`),\nand syllable-modifying or Vedic signs (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\nMalayalam includes a pre-base-reordering <samp>\"Ra\"</samp>.  A <samp>\"Halant,Ra\"</samp> sequence\nafter the base consonant or syllable base will be reordered to a pre-base position\nduring the final-reordering stage.\n\nMalayalam includes two consonants that can take on\npost-base form: <samp>\"Ya\"</samp> and <samp>\"Va\"</samp>.\n\n:::{figure-md}\n![Post-base Ya formation](/images/malayalam/malayalam-pstf-ya.svg \"Post-base Ya formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pstf-ya}\n\nPost-base Ya formation\n:::\n\n```{svg-color-toggle-button} malayalam-pstf-ya\n```\n\n\n\n:::{figure-md}\n![Post-base Va formation](/images/malayalam/malayalam-pstf-va.svg \"Post-base Va formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pstf-va}\n\nPost-base Va formation\n:::\n\n```{svg-color-toggle-button} malayalam-pstf-va\n```\n\n\nMalayalam includes one consonant that can take on a below-base form:\n\n  - <samp>\"Halant,La\"</samp> (after the base consonant or syllable base) takes on\n    a below-base form.\n\n:::{figure-md}\n![Below-base La formation](/images/malayalam/malayalam-blwf.svg \"Below-base La formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-blwf}\n\nBelow-base La formation\n:::\n\n```{svg-color-toggle-button} malayalam-blwf\n```\n\n\n> Note: Because Malayalam employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp>\n> pattern following the syllable base for these below-base forms will\n> be encountered. Stage 2, step 5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any two-part dependent vowels (matras) must be decomposed\ninto their left-side and right-side components. Malayalam has three\ntwo-part dependent vowels, \"O\" (`U+0D4A`), \"Oo\" (`U+0D4B`), and \"Au\"\n(`U+0D4C`). Each has a canonical decomposition, so this step is\nunambiguous. \n\n> \"O\" (`U+0D4A`) decomposes to \"`U+0D46`,`U+0D3E`\"\n>\n> \"Oo\" (`U+0D4B`) decomposes to \"`U+0D47`,`U+0D3E`\"\n>\n> \"Au\" (`U+0D4C`) decomposes to \"`U+0D46`,`U+0D57`\"\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/malayalam/malayalam-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} malayalam-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll right-side dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nAll below-base dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nMalayalam includes one consonant that can take on a below-base form:\n\n  - <samp>\"Halant,La\"</samp> (after the base consonant or syllable base) takes on\n    a below-base form.\n\n> Note: Because Malayalam employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in stage 2, step 1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: Malayalam differs from many Indic scripts in that <samp>\"Reph\"</samp>\n> usage is rare in the modern orthography. In word-initial positions, a\n> <samp>\"Ra,Halant\"</samp> sequence is typically replaced by a dead-consonant form,\n> <samp>\"Chillu R\"</samp>. \n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<mlm2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n\t\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `pref` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base positions\n  - `blwf` should match <samp>\"Halant,La\"</samp> in post-base positions and \n            <samp>\"La,Halant\"</samp> in non-initial pre-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,Ya\"</samp>, <samp>\"Halant,Va\"</samp>, and <samp>\"Halant,Ra\"</samp> in\n            post-base position\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Malayalam)\n\tpref \n\tblwf \n\tabvf (not used in Malayalam)\n\thalf\n\tpstf\n\tvatu (not used in Malayalam)\n\tcjct\n\tcfar (not used in Malayalam)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n> Note: The Malayalam Unicode block does not include a Nukta\n> codepoint, but Malayalam fonts may implement the `nukt` lookup using\n> similar characters from other blocks.\n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta composition](/images/malayalam/malayalam-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} malayalam-nukt\n```\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces specific sequences with required\nligatures. Malayalam differs from many other Indic scripts in that\nthere are typically many ligatures in a font that are implemented as\n`akhn` substitutions.\n\nThese sequences can occur anywhere in a syllable. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![Akhand KSsa ligation](/images/malayalam/malayalam-akhn-kssa.svg \"Akhand KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-akhn-kssa}\n\nAkhand KSsa ligation\n:::\n\n```{svg-color-toggle-button} malayalam-akhn-kssa\n```\n\n\n:::{figure-md}\n![Akhand NnTta ligation](/images/malayalam/malayalam-akhn-nntta.svg \"Akhand NnTta ligation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-akhn-nntta}\n\nAkhand NnTta ligation\n:::\n\n```{svg-color-toggle-button} malayalam-akhn-nntta\n```\n\n\n> Note: Modern Malayalam orthography prefers using the <samp>\"Chillu R\"</samp>\n> instead of <samp>\"Reph\"</samp>. Therefore, Malayalam fonts may implement <samp>\"Chillu\n> R\"</samp> as a substitution for <samp>\"Ra,Halant\"</samp> in the `akhn` feature. This\n> ensures that the substitution takes place before the `rphf` feature\n> is applied, so the font may omit the `rphf` feature entirely.\n\n:::{figure-md}\n![Akhand Chillu R ligation](/images/malayalam/malayalam-akhn-chillu-r.svg \"Akhand Chillu R ligation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-akhn-chillu-r}\n\nAkhand Chillu R ligation\n:::\n\n```{svg-color-toggle-button} malayalam-akhn-chillu-r\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n> Note: The <samp>\"Dot Reph\"</samp> substitution shown here is typically found only\n> in old-orthography Malayalam writing.\n\n:::{figure-md}\n![Dot Reph composition](/images/malayalam/malayalam-dot-reph.svg \"Dot Reph composition\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-dot-reph}\n\nDot Reph composition\n:::\n\n```{svg-color-toggle-button} malayalam-dot-reph\n```\n\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n> Note: Modern Malayalam orthography prefers using the <samp>\"Chillu R\"</samp>\n> instead of <samp>\"Reph\"</samp>. Therefore, Malayalam fonts may implement <samp>\"Chillu\n> R\"</samp> as a substitution for <samp>\"Ra,Halant\"</samp> in the `akhn` feature. This\n> ensures that the substitution takes place before the `rphf` feature\n> is applied, so the font may omit the `rphf` feature entirely.\n\n:::{figure-md}\n![Chillu R ligation](/images/malayalam/malayalam-akhn-chillu-r-1.svg \"Chillu R ligation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-akhn-chillu-r-1}\n\nChillu R ligation\n:::\n\n```{svg-color-toggle-button} malayalam-akhn-chillu-r-1\n```\n\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Malayalam.\n\n#### Stage 3, step 6: pref ####\n\nThe `pref` feature replaces pre-base-reordering consonant glyphs with\nany special forms. Malayalam includes one such reordering consonant,\n<samp>\"Ra\"</samp> when it occurs in post-base position.\n\nThe substitution of the nominal glyph for its special form takes place\nat this stage. However, the actual reordering move is performed later,\nin stage 4, step 4.\n\n:::{figure-md}\n![Pre-base Ra formation](/images/malayalam/malayalam-pstf-ra.svg \"Pre-base Ra formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pstf-ra}\n\nPre-base Ra formation\n:::\n\n```{svg-color-toggle-button} malayalam-pstf-ra\n```\n\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Malayalam includes one consonant that can take on a\nbelow-base form:, <samp>\"Halant,La\"</samp>.\n\nBecause Malayalam incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n  - The context defined for a `blwf` feature is:\n\n:::{table} `blwf` feature context\n    \n| Backtrack     | Matching sequence        | Lookahead     |\n|:--------------|:-------------------------|:--------------|\n| `_consonant_` | `_halant_`,\"La\"          | _none_        |\n:::\n\n\n:::{figure-md}\n![Below-base La formation](/images/malayalam/malayalam-blwf-1.svg \"Below-base La formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-blwf-1}\n\nBelow-base La formation\n:::\n\n```{svg-color-toggle-button} malayalam-blwf-1\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Malayalam.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n_Consonant_.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are two exceptions to the default behavior, for which the\nshaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n> Note: Malayalam does not usually incorporate half forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n>\n> Note: Some `<mlm2>` fonts may use the `half` feature to implement\n> Chillu substitutions, as in the example below\n\n\n:::{figure-md}\n![Half-form formation](/images/malayalam/malayalam-half.svg \"Half-form formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-half}\n\nHalf-form formation\n:::\n\n```{svg-color-toggle-button} malayalam-half\n```\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any\nspecial forms. Malayalam includes two consonants that can take on\npost-base form: <samp>\"Ya\"</samp> and <samp>\"Va\"</samp>.\n\n  - The context defined for a `pstf` feature is:\n\n:::{table} `pstf` feature context\n    \n| Backtrack       | Matching sequence        | Lookahead     |\n|:----------------|:-------------------------|:--------------|\n| `SYLLABLE_BASE` | `_halant_`,`_consonant_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Post-base Ya formation](/images/malayalam/malayalam-pstf-ya-1.svg \"Post-base Ya formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pstf-ya-1}\n\nPost-base Ya formation\n:::\n\n```{svg-color-toggle-button} malayalam-pstf-ya-1\n```\n\n:::{figure-md}\n![Post-base Va formation](/images/malayalam/malayalam-pstf-va-1.svg \"Post-base Va formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pstf-va-1}\n\nPost-base Va formation\n:::\n\n```{svg-color-toggle-button} malayalam-pstf-va-1\n```\n\n\n#### Stage 3, step 11: vatu ####\n\n> This feature is not used in Malayalam.\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<mlm2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n> Note: Malayalam does not usually incorporate conjunct forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n:::{figure-md}\n![Conjunct ligation](/images/malayalam/malayalam-cjct.svg \"Conjunct ligation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-cjct}\n\nConjunct ligation\n:::\n\n```{svg-color-toggle-button} malayalam-cjct\n```\n\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Malayalam.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n\n   - after all <samp>\"Chillu\"</samp> glyphs\n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"_Consonant_,Halant\"</samp> subsequences and all glyphs that resulted from a\nsubstitution on a <samp>\"_Consonant_,Halant,ZWJ\"</samp> subsequence, but will stop\nto the left of the base consonant or syllable base, and all conjuncts\nor ligatures that contain the base consonant or syllable base.\n\n:::{figure-md}\n![Matra positioning](/images/malayalam/malayalam-matra-position.svg \"Matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-matra-position}\n\nMatra positioning\n:::\n\n```{svg-color-toggle-button} malayalam-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> or <samp>\"Repha\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Malayalam incorporates the `REPH_POS_AFTER_MAIN`\nshaping characteristic, this final position is defined as immediately\nafter the syllable base.\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Malayalam incorporates the\n    > `REPH_POS_AFTER_MAIN` shaping characteristic, this means\n    > any positioning tag of `POS_ABOVEBASE_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_ABOVEBASE_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\nFinally, if the final position of <samp>\"Reph\"</samp> or <samp>\"Repha\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp>/<samp>\"Repha\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n:::{figure-md}\n![Repha positioning](/images/malayalam/malayalam-repha-position.svg \"Repha positioning\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-repha-position}\n\nRepha positioning\n:::\n\n```{svg-color-toggle-button} malayalam-repha-position\n```\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to before\nthe base consonant or syllable base.\n\nMalayalam includes one such reordering consonant. <samp>\"Ra\"</samp> occurring in the\npost-base position is reordered to a pre-base position at this step.\n\nThe algorithm for reordering <samp>\"Ra\"</samp> in this circumstance is:\n\n  - Only reorder the <samp>\"Ra\"</samp> if the current glyph was substituted using\n    the `pref` feature in stage 3, step 6.\n  - Select the final position using [the same method](#stage-4-step-2-pre-base-matras) as used for\n    reordering a pre-base matra.\n  - If the pre-base matra positioning algorithm cannot determine the final\n    position, place the <samp>\"Ra\"</samp> immediately before the base consonant or syllable base.\n\n:::{figure-md}\n![Pre-base-reordering consonant positioning](/images/malayalam/malayalam-pref-position.svg \"Pre-base-reordering consonant positioning\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-pref-position}\n\nPre-base-reordering consonant positioning\n:::\n\n```{svg-color-toggle-button} malayalam-pref-position\n```\n\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nMalayalam does not use the `init` feature, so this step will\ninvolve no work when processing `<mlm2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Malayalam)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Malayalam.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or\nsyllable bases that\nare adjacent to the below-base-consonant form of <samp>\"La\"</samp> with contextual ligatures.\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base form substitution](/images/malayalam/malayalam-psts.svg \"Post-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-psts}\n\nPost-base form substitution\n:::\n\n```{svg-color-toggle-button} malayalam-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n> Note: Some `<mlm2>` fonts may use the `haln` feature to implement\n> Chillu substitutions, as in the example below\n\n:::{figure-md}\n![Halant-form formation](/images/malayalam/malayalam-haln.svg \"Halant-form formation\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-haln}\n\nHalant-form formation\n:::\n\n```{svg-color-toggle-button} malayalam-haln\n```\n\n\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Malayalam shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Malayalam text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Malayalam, this includes <samp>\"Dot Reph\"</samp> in addition to the\ndiacritical marks and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/malayalam/malayalam-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} malayalam-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Malayalam, this includes below-base marks as well as\nthe below-base consonant form of <samp>\"La\"</samp>.\n\n:::{figure-md}\n![Below-base mark positioning](/images/malayalam/malayalam-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #malayalam-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} malayalam-blwm\n```\n\n\n\n## The `<mlym>` shaping model ##\n\nThe older Malayalam script tag, `<mlym>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<mlym>` and some users may still have documents that were written to\ntake advantage of `<mlym>` shaping.\n\n### Distinctions from `<mlm2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<mlym>` to\n`<mlm2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<mlym>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<mlm2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<mlym>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<mlym>` text, the left-side matra was always positioned\nimmediately before the base consonant or syllable base.\n\n\n### Advice for handling fonts with `<mlym>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<mlym>` shaping model.\n\n\n### Advice for handling text runs composed in `<mlym>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<mlym>` script tag and it is known that the font in use supports\nonly the `<mlm2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<mlym>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-mongolian.md",
    "content": "```{include} /_global.md\n```\n\n# Mongolian script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nMongolian script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Joining properties](#joining-properties)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The `<mong>` shaping model](#the-mong-shaping-model)\n      - [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)\n      - [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)\n      - [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)\n      - [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)\n      - [Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-5-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 7: Applying the positioning features from <abbr>GPOS</abbr>](#stage-7-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nThe Mongolian script is used to write multiple languages, most commonly\nMongolian, Sibe (or Xibe), and Manchu.  In addition, extensions to the\ncharacter set may be used to write Tibetan and Sanskrit. \n\nThe classical Mongolian alphabet includes several letters that differ\nphonetically but are identical in their visual appearance, such as \"O\"\n(`U+1823`, &#x1823;) and \"U\" (`U+1824`, &#x1824;). A variant of the\nclassical alphabet, called Todo (or \"clear\") Mongolian, was developed\nin the 17th Century to remove such ambiguous forms. The Todo\ncharacters are also included in the Mongolian Unicode block.\n\nDue to the common shaping features that the Mongolian script shares\nwith Arabic, a shaping engine can support Mongolian with the same\nshaping model [used for Arabic and related writing systems](opentype-shaping-arabic-general.md).\n\nHowever, several other, unrelated scripts are also used to write\nMongolian, including 'Phags-Pa, Soyombo, Zanabazar Square, Cyrillic, and\nLatin. Each of these scripts has its own OpenType shaping rules and its\nUnicode block, and does not use the general Arabic shaping model.\n\nMongolian is a joining script that uses inter-word spaces, so each\ncodepoint in a text run may be substituted with one of several\ncontextual forms corresponding to what, if any, characters appear\nbefore and after the codepoint. Most, but not all, letter sequences\njoin; shaping engines must track which positions trigger joining\nbehavior for each letter. \n\nMongolian is normally written (and, therefore, rendered) vertically,\nfrom top to bottom. Isolated words or short phrases in Mongolian that\nare included in text blocks of horizontal scripts are generally\nrotated 90 degrees counterclockwise, so that the letters run\nleft-to-right. On systems that do not support vertical text setting,\nthis left-to-right rendering is a common fallback strategy for full\nruns of Mongolian text.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nMongolian script. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a Mongolian\ncharacter that is capable of taking a diacritical mark. \n\nAll consonants and vowels are base characters in\nMongolian. Diacritical marks are not used in the Mongolian, Sibe, or\nManchu languages, but may be encountered in Tibetan or Sanskrit.\n\nA number of consonants in Mongolian take on different forms depending\non the vowels used elsewhere in the word. In addition, some letters\ntake on different forms when depending on whether they occur in the\nfirst syllable of a word or whether they are used in a native\nMongolian word versus a foreign word. Mongolian fonts implement\nsubstitutions capturing most of these form rules using <abbr title=\"Glyph Substitution table\">GSUB</abbr>. However,\nthere are occasions where the correct form may not be determined from\ncontext alone.\n\nTo indicate the correct form, the text run can include a **free\nvariation selector** immediately after the letter in\nquestion. There are four free variation selectors in the Mongolian\nblock (\"FVS1\", \"FVS2\", \"FVS3\", and \"FVS4\"), although some letters have\nalternate forms defined only for a subset of the free variation\nselectors.\n\nIn addition, letters vary as to whether alternate forms exist for the\nisolated, initial, medial, or final position, or for several\npositions. The forms that each selector triggers for each letter is\ndefined in the Unicode Mongolian block. \n\nFor example, the letter \"Manchu I\" (`U+1873`) has three alternate\nforms defined for the medial position:\n\n:::{figure-md}\n![Non FVS form substitution](/images/mongolian/mongolian-fvs-none.svg \"Non FVS form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fvs-none}\n\nNon-FVS substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fvs-none\n```\n\n:::{figure-md}\n![FVS1 form substitution](/images/mongolian/mongolian-fvs-fvs1.svg \"FVS1 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fvs-fvs1}\n\nFVS1 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fvs-fvs1\n```\n\n:::{figure-md}\n![FVS2 form substitution](/images/mongolian/mongolian-fvs-fvs2.svg \"FVS2 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fvs-fvs2}\n\nFVS2 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fvs-fvs2\n```\n\n:::{figure-md}\n![FVS3 form substitution](/images/mongolian/mongolian-fvs-fvs3.svg \"FVS3 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fvs-fvs3}\n\nFVS3 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fvs-fvs3\n```\n\n\n\nFree variation selectors have no visual appearance and no advance\nwidth; they are used only to trigger the proper substitution in the\nactive font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> tables. \n\n\n\n\nIn some Mongolian words, a word-final \"A\" or \"E\" is written\ndisconnected from the preceding letter. In such situations, the\n**Mongolian vowel separator** formatting character can be included\nbetween the two letters to trigger such a space.\n\nSibe text may use the **Sibe syllable boundary marker** (`U+1807`) to\ndenote syllable boundaries in foreign loanwords.\n\n**Kashida** (or **tatweel**) is the Arabic term for a glyph inserted\ninto a sequence for the purpose of elongating the baseline stroke of\nan Arabic letter. Mongolian features a similar character, called\n**nirugu**, in the Mongolian Unicode block.\n\n\n\n## Glyph classification ##\n\nBecause Mongolian is a joining (or cursive) script, proper shaping of\ntext runs involves identifying the joining behavior of each character,\nthen combining that information with any preceding or subsequent\ncharacters to determine the contextually correct form for display.\n\n### Joining properties ###\n\nMongolian characters are assigned a `JOINING_TYPE` property in the\nUnicode standard that indicates how they join to adjacent\ncharacters. There are six possible values: \n\n  - `JOINING_TYPE_LEFT` indicates that a character joins with\n    the subsequent character, but does not join with the preceding\n    character. \n\t\n  - `JOINING_TYPE_RIGHT` indicates that a character joins with the\n    preceding character, but does not join with the subsequent character.\t\n\n  - `JOINING_TYPE_DUAL` indicates that a character joins with the\n    preceding character and joins with the subsequent character.\n\t\n  - `JOINING_TYPE_NON_JOINING` indicates that a character does not\n    join with the preceding or with the subsequent character.\n\t\n  - `JOINING_TYPE_TRANSPARENT` indicates that the character does not\n    join with adjacent characters _and_ that the character must be\n    skipped over when the shaping engine is evaluating the joining\n    positions in a sequence of characters. When a\n    `JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,\n    the `JOINING_TYPE` of the preceding character passes\n    through. Diacritical marks are frequently assigned this value. \n\t\n  - `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces\n    the use of joining forms with the preceding and subsequent\n    characters. Kashidas and the Zero Width Joiner (`U+200D`) are both\n    `JOIN_CAUSING` characters.\n  \n  \n> Note: Almost all characters in Mongolian are of joining type\n> `DUAL`. The exceptions are `TRANSPARENT`, `NON_JOINING`, and\n> `JOIN_CAUSING`. Thus, the ambiguity that might be encountered due to\n> the usage of `LEFT` and `RIGHT` in the names of the other joining\n> types (which are so named in reference to the relative positions as\n> used in Arabic and related scripts) is avoided.\n\nIn other scripts using the Arabic shaping model, letters are also\nassigned to a `JOINING_GROUP` that indicates which fundamental\ncharacter they behave like with regard to joining behavior. Mongolian,\nhowever, does not use joining groups; all characters are assigned to\nthe _null_ joining group.\n\n\n### Mark classification ###\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nOnly one Mongolian mark belongs to a standard combining\nclass:\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|`U+18A9`   | 228             | &#x18A9; Ali Gali Dagalga          |\n:::\n\n\nAll other codepoints in the Mongolian block belong to class _0_.\n\nThe numeric values of these combining classes are used during Unicode\nnormalization.\n\t\n\t\t\t\n### Character tables ###\n\nSeparate character tables are provided for the Mongolian and Mongolian\nSupplement blocks, as well as for other miscellaneous\ncharacters that are used in `<mong>` text runs:\n\n  - [Mongolian character table](character-tables/character-tables-mongolian.md#mongolian-character-table)\n  - [Mongolian Supplement character table](character-tables/character-tables-mongolian.md#mongolian-supplement-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-mongolian.md#miscellaneous-character-table)\n\n\nThe tables list each codepoint along with its Unicode general\ncategory and its joining type. For letters, the table lists the\ncodepoint's joining group. For diacritical marks, the table lists the\ncodepoint's mark combining class. The codepoint's Unicode name and an example\nglyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph                        |\n|:----------|:-----------------|:-------------|:--------------|:-----------|:-----------------------------|\n|`U+1828`   | Letter           | DUAL         | _null_        | _0_        | &#x1828; Na                  |\n| | | | | |\n|`U+1885`   | Mark [Mn]        | TRANSPARENT  | _null_        | _0_        | &#x1885; Ali Gali Baluda     |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Mongolian text include the dotted-circle placeholder (`U+25CC`),\nthe zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`),\nthe no-break space (`U+00A0`) and the narrow no-break space(`U+202F`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text documents may\nalso use other characters, such as hyphens or dashes, in a similar\nplaceholder fashion; shaping engines should cope with this situation\ngracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (which is not common in Mongolian but can occur in\n    other scripts)\n\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence <samp>\"_Letter_,ZWJ\"</samp>\nwould be used. To show the medial form of a letter in isolation, the\nsequence <samp>\"ZWJ,_Letter_,ZWJ\"</samp> would be used.\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is primarily used to prevent a\ncursive connection between two adjacent characters that would, under\nnormal circumstances, form a join. \n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as diacritical marks) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\nThe narrow no-break space serves a different function in Mongolian. It\nis used to visually separate the main body of a word from the word's\nsuffix. Not all Mongolian words incorporate a narrow no-break space.\n\n\n## The `<mong>` shaping model ##\n\nProcessing a run of `<mong>` text involves seven top-level stages:\n\n1. Transient reordering of modifier combining marks\n2. Compound character composition and decomposition\n3. Computing letter joining states\n4. Applying the `stch` feature\n5. Applying the language-form substitution features from <abbr>GSUB</abbr>\n6. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n7. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Transient reordering of modifier combining marks ###\n\n<!--- http://www.unicode.org/reports/tr53/tr53-1.pdf --->\n> Note: because Mongolian does not feature the \"Shadda\" mark or any\n> marks that belong to _Modifier Combining Marks_ (<abbr>MCM</abbr>) classes, this\n> stage should not involve any additional work when processing\n> `<mong>` text runs. It is included here to maintain consistency with\n> other scripts that utilize the general Arabic-based shaping model.\n\nSequences of adjacent marks must be reordered so that they appear in\nthe appropriate visual order before the mark-to-base and mark-to-mark\npositioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nIn particular, those marks that have strong affinity to the base\ncharacter must be placed closest to the base.\n\nThis mark-reordering operation is distinct from the standard,\ncross-script mark-reordering performed during Unicode\nnormalization. The standard Unicode mark-reordering algorithm is based\non comparing the _Canonical_Combining_Class_ (<abbr>Ccc</abbr>) properties of mark\ncodepoints, whereas this script-specific reordering utilizes the\n_Modifier_Combining_Mark_ (<abbr>MCM</abbr>) subclasses specified in the\ncharacter tables.\n\nThe algorithm for reordering a sequence of marks is:\n\n  - First, move any <samp>\"Shadda\"</samp> (combining class `33`) characters to the\n    beginning of the mark sequence.\n\t\n  -\tSecond, move any subsequence of combining-class-`230` characters that begins\n       with a `230_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters. The subsequence must be moved\n       as a group.\n\n  - Finally, move any subsequence of combining-class-`220` characters that begins\n       with a `220_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters and before all class-`230`\n       characters. The subsequence must be moved as a group.\n\n> Note: Unicode describes this mark-reordering operation, the Arabic\n> Mark Transient Reordering Algorithm (<abbr>AMTRA</abbr>), in Technical Report 53,\n> which describes it in terms that are distinct from standard,\n> <abbr>Ccc</abbr>-based mark reordering.\n>\n> Specifically, <abbr title=\"Arabic Mark Transient Reordering Algorithm\">AMTRA</abbr> is designated as an operation performed during\n> text rendering only, which therefore does not impact other\n> Unicode-compliance issues such as allowable input sequences or text\n> encoding.\n>\n> However, shaping engines may choose to perform the reordering of\n> modifier combining marks in conjunction with their Unicode\n> normalization functionality for increased efficiency.\n\n### Stage 2: Compound character composition and decomposition ###\n\nThe `ccmp` feature allows a font to substitute\n\n - mark-and-base sequences with a pre-composed glyph including both\n    the mark and the base (as is done in with a ligature substitution)\n\t\n  - individual compound glyphs with the equivalent sequence of\n    decomposed glyphs (such as decomposing a letter with inherent\n    marks into a separate fundamental-letter glyph followed by a\n    marks-only glyph, to permit more precise positioning)\n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n### Stage 3: Computing letter joining states ###\n\nIn order to correctly apply the initial, medial, and final form\nsubstitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr> during stage 6, the shaping engine must\ntag every letter for possible application of the appropriate feature.\n\n> Note: The following algorithm includes rules for processing `<syrc>`\n> text in addition to `<mong>` text. Implementers concerned only with\n> shaping `<mong>` text can omit the portions for `<syrc>`-specific\n> rules. \n\nTo determine which feature is appropriate, the shaping engine must\nexamine each word in turn and compute each letter's joining state from\nthe letter's `JOINING_TYPE` and the `JOINING_TYPE` of the\npreceding character (if any).\n\n> Note: Although Mongolian uses inter-word spaces, the `init` feature\n> does _not_ refer to word-initial letters only and the `fina` feature\n> does _not_ refer to word-final letters only.\n>\n> Rather, both of these terms are defined with respect to whether or\n> not the preceding and subsequent letters form joins with the current\n> letter. The letters at word boundaries will, naturally, take on\n> initial and final forms, but initial and final forms of letters also\n> occur regularly within words, when the letter in question is\n> adjacent to a letter than does not form joins.\n\nThis computation starts from the first letter of the word, temporarily\ntagging the letter for `isol` substitution. If the first\nletter is the only letter in the word, the `isol` tag will remain unchanged.\n\nFrom here, the algorithm consumes each character in the string, one at\na time, keeping track of the JOINING_TYPE of the previous character. \n\nIf the current character is JOINING_TYPE_TRANSPARENT, move on to the next\ncharacter but preserve the currently-tracked JOINING_TYPE at its previous state.\n\nIf the preceding character's JOINING_TYPE is LEFT, DUAL, or\nJOIN_CAUSING:\n  - In `<syrc>` text, if the current character is <samp>\"Alaph\"</samp>, tag the\n    current character for `med2`, then update the tag for the\n    preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n  - If the current character's JOINING_TYPE is RIGHT, DUAL, or\n    JOIN_CAUSING, tag the current character for `fina`, then update\n    the tag for the preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n\nOtherwise, tag the current character for `isol`.\n\nAfter testing the final character of the word, if the text is in `<syrc>` and\nif the last character that is not JOINING_TYPE_TRANSPARENT or\nJOINING_TYPE_NON_JOINING is <samp>\"Alaph\"</samp>, perform an additional test:\n  - If the preceding character is JOINING_TYPE_LEFT, tag the current character\n    for `fina`\n  - If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current\n    character for `fin3`\n  - Otherwise, tag the current character for `fin2`\n\n\nOnce the last character of the word has been processed, proceed to the\nnext word and repeat the algorithm, starting at the beginning of the\nnext word.\n\n> Note: Because the processing of the characters in the algorithm\n> described above is deterministic, shaping engines may choose to\n> implement the joining-state computation as a state machine, in a lookup\n> table, or by any other means desirable.\n\nAt the end of this process, all letters should be tagged for possible\nsubstitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or\n`fin3` features.\n\n### Stage 4: Applying the `stch` feature ###\n\nThe `stch` feature decomposes and stretches special marks that are\nmeant to extend to the full width of words to which they are\nattached. It was defined for use in `<syrc>` text runs for the <samp>\"Syriac\nAbbreviation Mark\"</samp> (`U+070F`) but it can be used with similar marks in\nother scripts.\n\nTo apply the `stch` feature, the shaping engine should first decompose the\n`U+070F` glyph into components, which results in a beginning point,\nmidpoint, and endpoint glyphs plus one (or more) extension glyphs: at\nleast one extension between the beginning and midpoint glyphs and at\nleast one extension between the midpoint and endpoint glyphs. \n\nThe shaping engine must then calculate the total length of the word to\nwhich the mark applies. That length, minus the advance widths of the\nbeginning, middle, and endpoint glyphs of the mark, must be divided by\ntwo. \n\nThe result, divided by the advance width of the extension glyph\nand rounded up to the next integer, tells the shaping engine how many\ncopies of the extension glyph must be placed between the midpoint and\neach end of the mark.\n\nFollowing this procedure ensures that the same number of extensions is\nused on each side of the mark so that it remains symmetrical.\n\nFinally, the decomposed mark must be reordered as follows: \n\n  - All of the glyphs in the sequence for the mark, _except_ for\n    the final glyph, are repositioned as a group so that they precede\n    the word to which the mark is attached.\n  - The final glyph in the mark sequence is repositioned to the end of\n    the word.\n\t\n\n### Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n\tlocl\n\tisol\n\tfina\n\tfin2 (not used in <mong>)\n\tfin3 (not used in <mong>)\n\tmedi\n\tmed2 (not used in <mong>)\n\tinit\n\trlig\n\trclt\n\tcalt\n\t\n> Note: `rlig` and `calt` need to be appled to the word as a whole before\n> continuing to the next feature.\n\n#### Stage 5, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n#### Stage 5, step 2: isol ####\n\nThe `isol` feature substitutes the default glyph for a codepoint with\nthe isolated form of the letter.\n\n> Note: It is common for a font to use the isolated form of a letter\n> as the default, in which case the `isol` feature would apply no\n> substitutions. However, this is only a convention, and the active\n> font may use other forms as the default glyphs for any or all\n> codepoints.\n\n:::{figure-md}\n![Isolated form substitution](/images/mongolian/mongolian-isol.svg \"Isolated form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-isol}\n\nIsolated form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-isol\n```\n\n\n\nThe Mongolian free-variation selectors can also be used in conjunction\nwith `isol` to trigger alternate forms of certain letters as required\nby the orthography.\n\n:::{figure-md}\n![Isolated FVS1 form substitution](/images/mongolian/mongolian-isol-fvs1.svg \"Isolated FVS1 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-isol-fvs1}\n\nIsolated FVS1 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-isol-fvs1\n```\n\n\n\n#### Stage 5, step 3: fina ####\n\nThe `fina` feature substitutes the default glyph for a codepoint with\nthe terminal (or final) form of the letter.\n\n:::{figure-md}\n![Final form substitution](/images/mongolian/mongolian-fina.svg \"Final form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fina}\n\nFinal form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fina\n```\n\n\n\nThe Mongolian free-variation selectors can also be used in conjunction\nwith `fina` to trigger alternate forms of certain letters as required\nby the orthography.\n\n:::{figure-md}\n![Final FVS2 form substitution](/images/mongolian/mongolian-fina-fvs2.svg \"Final FVS2 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-fina-fvs2}\n\nFinal FVS2 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-fina-fvs2\n```\n\n\n\n#### Stage 5, step 4: fin2 ####\n\nThis feature is not used in `<mong>` text.\n\n#### Stage 5, step 5: fin3 ####\n\nThis feature is not used in `<mong>` text.\n\n#### Stage 5, step 6: medi ####\n\nThe `medi` feature substitutes the default glyph for a codepoint with\nthe medial form of the letter.\n\n:::{figure-md}\n![Medial form substitution](/images/mongolian/mongolian-medi.svg \"Medial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-medi}\n\nMedial form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-medi\n```\n\n\n\nThe Mongolian free-variation selectors can also be used in conjunction\nwith `medi` to trigger alternate forms of certain letters as required\nby the orthography.\n\n:::{figure-md}\n![Medial FVS1 form substitution](/images/mongolian/mongolian-medi-fvs1.svg \"Medial FVS1 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-medi-fvs1}\n\nMedial FVS1 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-medi-fvs1\n```\n\n\n\n#### Stage 5, step 7: med2 ####\n\nThis feature is not used in `<mong>` text.\n\n#### Stage 5, step 8: init ####\n\nThe `init` feature substitutes the default glyph for a codepoint with\nthe initial form of the letter.\n\n:::{figure-md}\n![Initial form substitution](/images/mongolian/mongolian-init.svg \"Initial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-init}\n\nInitial form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-init\n```\n\n\n\nThe Mongolian free-variation selectors can also be used in conjunction\nwith `init` to trigger alternate forms of certain letters as required\nby the orthography.\n\n:::{figure-md}\n![Initial FVS1 form substitution](/images/mongolian/mongolian-init-fvs1.svg \"Initial FVS1 form substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-init-fvs1}\n\nInitial FVS1 form substitution\n:::\n\n```{svg-color-toggle-button} mongolian-init-fvs1\n```\n\n\n\n#### Stage 5, step 9: rlig ####\n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Required ligature substitution](/images/mongolian/mongolian-rlig.svg \"Required ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #mongolian-rlig}\n\nRequired ligature substitution\n:::\n\n```{svg-color-toggle-button} mongolian-rlig\n```\n\n\n\n#### Stage 5, step 10: rclt ####\n\nThe `rclt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe `rclt` feature should be used to perform such substitutions that\nare required by the orthography of the active script and\nlanguage. Substitutions made by `rclt` cannot be disabled by \napplication-level user interfaces.\n\n#### Stage 5, step 11: calt ####\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe `calt` feature, in contrast to `rclt` above, performs\nsubstitutions that are not mandatory for orthographic\ncorrectness. However, unlike `rclt`, the substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\n<!--- ![Contextual alternate substitution](/images/mongolian/mongolian-calt.svg) --->\n\n\n\n### Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    liga\n\tdlig\n\tcswh\n\tmset\n\t\n\n#### Stage 6, step 1: liga ####\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n<!--- ![Standard ligature substitution](/images/mongolian/mongolian-liga.svg) --->\n\n\n\n#### Stage 6, step 2: dlig ####\n\nThe `dlig` feature substitutes additional optional ligatures that are\noff by default. Substitutions made by `dlig` may be disabled by\napplication-level user interfaces.\n\n\n#### Stage 6, step 3: cswh ####\n\nThe `cswh` feature substitutes contextual swash variants of\nglyphs. \n\n<!--- For example, the active font might substitute a longer variant\nof <samp>\"Noon\"</samp> when a certain number of subsequent glyphs do not descend\nbelow the baseline. --->\n\n\n#### Stage 6, step 4: mset ####\n\nThe `mset` feature performs mark positioning by substituting sequences\nof bases and marks with precomposed base-and-mark glyphs.\n\n> Note: Positioning marks with the `mark` and `mkmk` features of <abbr title=\"Glyph Positioning table\">GPOS</abbr> is\n> preferred, because `mset` can interfere with the OpenType shaping\n> process. For example, substitution rules contained in `mset` may not be able to\n> account for necessary mark-reordering adjustments conducted in the\n> next stage.\n> \n> Nevertheless, when the active font uses `mset` substitutions, the\n> shaping engine must deal with the situation gracefully.\n\n### Stage 7: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    curs\n\tkern\n\tmark\n\tmkmk\n\n#### Stage 7, step 1: curs ####\n\nThe `curs` feature perform cursive positioning. Each glyph has an\nentry point and exit point; the `curs` feature positions glyphs so\nthat the entry point of the current glyph meets the exit point of the\npreceding glyph.\n\n<!--- ![Cursive positioning](/images/mongolian/mongolian-curs.svg) --->\n\n\n#### Stage 7, step 2: kern ####\n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\n\n#### Stage 7, step 3: mark ####\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n<!--- ![Mark positioning](/images/mongolian/mongolian-mark.svg) --->\n\n\n#### Stage 7, step 4: mkmk ####\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n\n"
  },
  {
    "path": "opentype-shaping-myanmar.md",
    "content": "```{include} /_global.md\n```\n\n# Myanmar shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Myanmar script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Myanmar character tables](#myanmar-character-tables)\n  - [The `<mym2>` shaping model](#the-mym2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-4-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 5: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-5-applying-remaining-positioning-features-from-gpos)\n  - [The `<mymr>` shaping model](#the-mymr-shaping-model)\n\n\n## General information ##\n\nThe Myanmar or Burmese script is a descendant of the Brahmi script, and follows\nmany of the same general patterns found in [Indic\nscripts](opentype-shaping-indic-general.md). However, Myanmar\nincorporates enough distinctions of its own that it is generally not\nadvisable to attempt supporting it in a general-purpose Indic shaping\nengine. \n\nFor example, Myanmar script includes a \"Reph\"-like feature known as\n\"Kinzi\" although, unlike \"Reph\", a \"Kinzi\" may be formed by any of\nseveral initial consonants. Also, notably, real-world texts written in\nMyanmar script often do not use inter-word spaces, which may make the\nprocess of syllable identification substantially different from\nprocessing Indic scripts.\n\nThe Myanmar script is used to write multiple languages, most commonly\nBurmese, Mon, Karen, Kayah, Shan, Palaung, and Pali. In addition,\nSanskrit may be written in Myanmar, so Myanmar script runs may include \nglyphs from the Vedic Extensions block of Unicode. \n\nThere are two extant Myanmar script tags defined in OpenType, `<mymr>`\nand `<mym2>`. The older script tag, `<mymr>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<mym2>`\nshaping model. However, if a font is encountered that supports only\n`<mymr>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Brahmi-derived and\nIndic scripts.  The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. Syllables\nin Myanmar script can include sequences of multiple vowels and,\ntherefore, multiple matras.\n\n**Halant** and **Virama** are both standard terms for the below-base\n\"vowel-killer\" mark. Unicode documents use the term \"virama\" most\nfrequently, while OpenType documents use the term \"halant\" most\nfrequently.\n\n**Asat** is the term for the \"pure killer\" character in Myanmar. An\nasat after a consonant serves a similar function as a halant by\nsuppressing the inherent vowel of the consonant, but the asat is\nrendered visually, either as an above-base mark or in a substitution\nform triggered by an adjacent codepoint.\n\nAn asat may be placed following a consonant to denote that the\nconsonant is doubled. An asat may also be followed by a halant, a\nsequence that is used to trigger the \"Kinzi\" special form.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the\ndiacritical mark indicating that the preceding vowel should be\nnasalized. Myanmar script does not use a chandrabindu; however, the\n_BINDU_ category is used for other marks during the\nsyllable-identification stage in order to maintain compatibility with\nother scripts. \n\n**Tone markers** are an important part of languages written in Myanmar\nscript. These markers may be either spacing-combining (`[Mc]`) or\nnon-spacing (`[Mn]`). Several tone markers may be used within a single\nsyllable.\n\nThe term **base consonant** is also critical to Myanmar shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra). \n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Kinzi** form of certain consonants is an\nexample, akin to the \"Reph\" form of \"Ra\" in many Indic scripts.\n\nMany Myanmar letters may be followed by a **Variation Selector**\ncodepoint in order to request the **dotted form** of the corresponding\nglyph, which is preferred for some languages written with Myanmar\nscript. Fonts are not required to include the dotted-form variants;\nwhen they are absent from the active font, the default form of the\ncorresponding letter will be used instead.\n\n:::{figure-md}\n![Dotted form substitution with variation selector](images/myanmar/myanmar-dotted.svg \"Dotted form substitution with variation selector\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-dotted}\n\nDotted form substitution with variation selector\n:::\n\n```{svg-color-toggle-button} myanmar-dotted\n```\n\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Myanmar text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nMyanmar glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Myanmar script. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU` and `VISARGA`. Some less-common marks behave according to\nrules that are similar to these common marks, and are therefore\nclassified with the corresponding common mark. The Vedic Extensions\nalso include a `CANTILLATION` class for tone marks.\n\nMyanmar's \"halant\" codepoint is classified as `INVISIBLE_STACKER`,\nrather than the more common `VIRAMA`. This is to indicate that, unlike\nthe \"halant\"/\"virama\" characters in several other scripts, the Myanmar\n\"halant\" is never rendered visually as a glyph.\n\nMyanmar's \"Asat\" codepoint, however, is rendered visually when it\nappears in a syllable. The \"Asat\" behaves differently than Indic\n\"halant\", however. It can be used to kill a consonant's inherent vowel\nsound, but it is not used between consonants to indicate the formation\nof a conjunct or a subjoined form. The \"Asat\" is classified as\n`PURE_KILLER`.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nMyanmar uses one subclass of consonant, `CONSONANT_MEDIAL`. This\nsubclass is used for special non-base variants of several consonants that\nserve to modify the syllable's vowel sound. These medial consonants\nare rendered as non-spacing marks attached to the base consonant.\n\n> Note: The medial \"Ra\" is reordered to pre-base-consonant\n> position. The other medial consonants do not require reordering.\n\n> Note: The medial consonants are encoded in separate codepoints,\n> distinguishing them from the standard (non-medial) variant of the\n> corresponding consonant. \n\nIn addition, the Myanmar and Myanmar Extended Unicode blocks include\nseveral codepoints classified as `CONSONANT_PLACEHOLDER`. These\ncodepoints are used in verbal transcriptions to take tone\nmarks. However, these glyphs are not consonants in the true sense and\nare unlikely to occur within normal words.\n\nOther characters, such as symbols, need no special\nattention from the shaping engine, so they are not assigned a shaping\nclass.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the base consonant to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the base consonant.\n  - `RIGHT_POSITION` matras are positioned to the right of the base consonant.\n  - `TOP_POSITION` matras are positioned above the base consonant.\n  - `BOTTOM_POSITION` matras are positioned below base consonant.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\nMany Myanmar letters may be followed by a `Variation Selector`\ncodepoint in order to request the \"dotted form\" of the\ncorresponding glyph. These variations are defined in Unicode's\n`Standardized Variants` document; only the codepoints listed in that\ndocument support substitution via variation selectors. At present,\nonly \"Variation Selector 1\" (`U+FE00`) is used with Myanmar.\n\nIf the active font does not include glyphs representing the requested\nvariant of the letter preceding the variation selector, then the\nshaping engine must treat the variation selector codepoint as\ninvisible and ignorable and use the default version of the preceding\nletter. \n\n\n### Myanmar character tables ###\n\nSeparate character tables are provided for the Myanmar and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<mym2>` text runs:\n\n  - [Myanmar character table](character-tables/character-tables-myanmar.md#myanmar-character-table)\n  - [Myanmar Extended-A character table](character-tables/character-tables-myanmar.md#myanmar-extended-a-character-table)\n  - [Myanmar Extended-B character table](character-tables/character-tables-myanmar.md#myanmar-extended-b-character-table)\n  - [Myanmar Extended-C character table](character-tables/character-tables-myanmar.md#myanmar-extended-c-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-myanmar.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-myanmar.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1000`   | Letter           | CONSONANT         | _null_                     | &#x1000; Ka                  |\n| | | | |\n|`U+1036`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x1036; Anusvara            |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\nOther important characters that may be encountered when shaping runs\nof Myanmar text include the dotted-circle placeholder (`U+25CC`), \nthe no-break space (`U+00A0`), and the zero-width space (`U+200B`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n<!--- The zero-width joiner is primarily used to prevent the formation of a\nsubjoining form from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. The sequence\n<samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the substitution of a\nsubjoined form for the second consonant. --->\n\n<!---\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\nwhere an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width joiner\notherwise would.\n--->\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display\nthose codepoints that are defined as non-spacing (marks, dependent\nvowels (matras), below-base consonant forms, and post-base consonant\nforms) in an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or\n<samp>\"NBSP,_matra_\"</samp>.\n\nThe zero-width space may be used between words — even though no visual\nword spacing results — in order to indicate word breaks within a text\nthat can be used by line-breaking algorithms in a higher-level\ntypesetting environment.\n\n\n\n## The `<mym2>` shaping model ##\n\nProcessing a run of `<mym2>` text involves five top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Applying all remaining substitution features from <abbr>GSUB</abbr>\n5. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Brahmi-derived and Indic scripts, the initial reordering\nstage and the final reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage four, however, do not have a mandatory order.\n\n\nMyanmar exhibits many of the same shaping patterns found in Indic\nscripts, but it differs in a few critical characteristics. With regard\nto these common variations, Myanmar's specific shaping \ncharacteristics include:\n\n\n  - The first consonant of a syllable is always the base consonant,\n    excluding a consonant that is part of an initial <samp>\"Kinzi\"</samp>-forming\n    sequence (if it is present).\n\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `BASE_POS_FIRST`.\n  \n  - <samp>\"Kinzi\"</samp> is always encoded as a syllable-initial sequence, but it\n    is reordered. The final position of <samp>\"Kinzi\"</samp> is immediately after\n    the base consonant. \n\n> Note: For comparison with the General Indic shaping model, the Kinzi\n> -encoding characteristic would correspond to `REPH_MODE_EXPLICIT`,\n> and the reordering characteristic would correspond to `POS_AFTER_MAIN`.\n  \n  - The below-base forms feature is applied only to consonants\n    after the base consonant. \n\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `BLWF_MODE_POST_ONLY`.\n\n  - Medial Ra is reordered to pre-base position.\n\n  - Pre-base matras are reordered to the beginning of the\n    syllable. Multiple pre-base matras can occur; any such sequences\n    must be moved together, as a block, at the reordering stage.\n\n> Note: For comparison with the General Indic shaping model, this\n> characteristic is distinct to Mynanmar script. Indic scripts apply \n> different reordering rules to pre-base matras that depend on the\n> contents of the syllable.\n\t\n  - The ordering positions for right-side and above-base matras is the\n    same. All are reordered to immediately after all subjoined consonants.\n\t\n  - Below-base matras are reordered to immediately before any\n    right-side and above-base matras.\n    \t\n> Note: For comparison with the General Indic shaping model, this\n> characteristic would correspond to `MATRA_POS_TOP`,\n> `MATRA_POS_RIGHT` taking the ordering position \n> `POS_AFTER_SUBJOINED`, and `MATRA_POS_BOTTOM` taking the ordering\n> position `POS_BELOWBASE_CONSONANT`. \n\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Myanmar consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Myanmar Unicode block enumerates two modifier signs,\n> \"Anusvara\" (`U+1036`) and \"Visarga\" (`U+1038`). There are also\n> twenty-one tone markers in the Myanmar and Myanmar Extended-A\n> blocks. In addition, Sanskrit text written in Myanmar may include\n> additional signs from Vedic Extensions block.\n\nBecause texts written in Myanmar script do not generally employ\ninter-word spaces, however, shaping engines must rely on\nsyllable-identification algorithms to recognize word-boundary\npatterns — distinguishing numeric sequences, symbols, punctuation, and other\nmiscellaneous script characters from syllables within words.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that vowel is the\nsyllable's only vowel sound and, by definition, there is no \"base\"\nconsonant. \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nGenerally speaking, the base consonant is the first consonant of the\nsyllable and its vowel sound designates the end of the syllable. The\nexception to this rule is consonants that are part of a\n<samp>\"Kinzi\"</samp>-triggering sequence.\n\nPost-base consonants in a valid syllable will be preceded by <samp>\"Halant\"</samp>\nmarks. \n\n\tBaseC Halant Post-baseC\n\t\nThe algorithm for correctly identifying the base consonant includes a\ntest to recognize these sequences and not mis-identify the base\nconsonant.\n\nMedial consonants, if they occur, will not be preceded by a\n<samp>\"Halant\"</samp>. This is because medial consonants in Myanmar are used to\nmodify the vowel sound of the syllable.\n\n> Note: in the Myanmar script, all medial consonants have their own\n> distinct codepoints. Therefore, they can be identified by codepoint\n> alone, and there is no need for a text run to identify them using\n> any special sequences.\n\n\nAs with other Brahmi-derived and Indic scripts, the consonant <samp>\"Ra\"</samp> receives\nspecial treatment. \n\n  - A <samp>\"Medial Ra\"</samp> (`U+103C`) must be reordered to a position immediately\n    before the syllable's base consonant. \n\t\n\tNote, however, that <samp>\"Medial Ra\"</samp> is a separate codepoint from the\n    standard <samp>\"Ra\"</samp> (`U+101B`). \n\t\n  - A syllable-initial <samp>\"Ra\"</samp> may also be part of a <samp>\"Kinzi\"</samp>-triggering\n    sequence. \n\t\n\tNotably, however, although <samp>\"Ra\"</samp> alone will take on the <samp>\"Reph\"</samp> form\n    in Indic script sequences, the Myanmar script's <samp>\"Kinzi\"</samp> feature\n    can be triggered for three consonants, depending on the language\n    in use: <samp>\"Ra\"</samp> (`U+101B`), <samp>\"Nga\"</samp> (`U+1004`), and <samp>\"Mon Nga\"</samp>\n    (`U+105A`). In each case, the <samp>\"Kinzi\"</samp> form is triggered by an\n    explicit sequence: <samp>\"_consonant_,Asat,Halant\"</samp>.\n\t\n\tThere are, therefore, exactly three <samp>\"Kinzi\"</samp>-forming sequences to\n    test for:\n\t  - <samp>\"Ra,Asat,Halant\"</samp>\n\t  - <samp>\"Nga,Asat,Halant\"</samp>\n\t  - <samp>\"Mon Nga,Asat,Halant\"</samp>\n\n:::{figure-md}\n![Ra Kinzi](images/myanmar/myanmar-kinzi-ra.svg \"Ra Kinzi\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-kinzi-ra}\n\nRa Kinzi\n:::\n\n```{svg-color-toggle-button} myanmar-kinzi-ra\n```\n\n\n:::{figure-md}\n![Nga Kinzi](images/myanmar/myanmar-kinzi-nga.svg \"Nga Kinzi\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-kinzi-nga}\n\nNga Kinzi\n:::\n\n```{svg-color-toggle-button} myanmar-kinzi-nga\n```\n\n\n:::{figure-md}\n![Mon Nga Kinzi](images/myanmar/myanmar-kinzi-monnga.svg \"Mon Nga Kinzi\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-kinzi-monnga}\n\nMon Nga Kinzi\n:::\n\n```{svg-color-toggle-button} myanmar-kinzi-monnga\n```\n\n\nIn the Myanmar (or Burmese) language, <samp>\"Nga\"</samp> is the only <samp>\"Kinzi\"</samp>-forming\nconsonant. <samp>\"Mon Nga\"</samp> can form a <samp>\"Kinzi\"</samp> in the Mon language, and <samp>\"Ra\"</samp>\ncan form a <samp>\"Kinzi\"</samp> in Sanskrit written with the Myanmar script.\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Myanmar script, may\n> not adhere to the syllable-formation rules described above. \n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following regular expressions can be used to match Myanmar-script\nsyllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= \"Ra\" | \"Nga\" | \"Mon Nga\"\n_consonant_ \t= `CONSONANT` | `CONSONANT_PLACEHOLDER` - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_halant_\t= `INVISIBLE_STACKER`\n_asat_\t\t= \"Asat\"\n_a_\t\t= \"Anusvara\" | \"Sign Ai\"\n_db_\t\t= \"Dot Below\"\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_mh_\t\t= \"Medial Ha\" | \"Mon Medial La\"\n_mr_\t\t= \"Medial Ra\"\n_mw_\t\t= \"Medial Wa\" | \"Shan Medial Wa\"\n_my_\t\t= \"Medial Ya\" | \"Mon Medial Na\" | \"Mon Medial Ma\"\n_d_\t\t= `NUMBER`\n_pt_\t\t= \"Tone Sgaw Karen Hathi\" | \"Tone Sgaw Karen Ke Pho\" |\n\t          \"Western Pwo Karen Tone 1\" | \"Western Pwo Karen Tone\n\t          2\" | \"Western Pwo Karen Tone 3\" | \"Western Pwo Karen\n\t          Tone 4\" | \"Western Pwo Karen Tone 5\" | \"Pao Karen\n\t          Tone\" \n_sm_\t\t= \"Visarga\" | \"Shan Tone 2\" | \"Shan Tone 3\" | \"Shan\n\t          Tone 5\" | \"Shan Tone 6\" | \"Shan Council Tone 2\" |\n\t          \"Shan Council Tone 3\" | \"Shan Council Emphatic Tone\"\n\t          | \"Rumai Palaung Tone 5\" | \"Khamti Tone 1\" | \"Khamti\n\t          Tone 3\" | \"Aiton A\" \n_punc_\t\t= \"Little Section\" | \"Section\"\n_matrapre_\t= `MATRA` & `LEFT_POSITION`\n_matrapost_\t= `MATRA` &`RIGHT_POSITION`\n_matraabove_\t= `MATRA` & `TOP_POSITION` - _a_\n_matrabelow_\t= `MATRA` & `BOTTOM_POSITION`\n_gb_\t\t= U+002D | U+00A0 | U+00D7 | U+2012 | U+2013 | U+2014 |\n              U+2015 | U+2022 | U+25CC | U+25FB | U+25FC | U+25FD |\n\t\t\t  U+25FE \n_cs_\t\t= `CONSONANT_WITH_STACKER`\n_v_\t\t= `VISARGA`\n_vs_\t\t= \"Variation Selector\"\n```\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp>, <samp>\"Nga\"</samp>, and <samp>\"Mon Nga\"</samp> characters that do not\n> trigger <samp>\"Kinzi\"</samp> forms. \n>\n> Note, also, that the `CONSONANT_PLACEHOLDER` class is unioned with\n> the `CONSONANT` class for the purpose of syllable identification,\n> even those these two classes are treated separately in general.\n>\n> Note: The _mh_, _mw_, and _my_ identification classes include\n> several medial letters from the non-Burmese languages; they are\n> grouped according to the medial consonants in Burmese that are the\n> closest match in terms of shaping behavior.\n>\n> Note: <samp>\"Sign Ai\"</samp> is classified as _a_, not as _matraabove_, in order\n> to implement orthographically correct behavior.\n>\n> Note: the _gb_ identification class includes several \"generic base\"\n> codepoints that are often used in real-world text runs to act as\n> placeholders for missing letters.\n\n> Note: the tone marker codepoints are divided up between two\n> identification classes, reflecting the differing orthographic rules\n> they follow. The _pt_ identification class constitutes the \"Pwo\n> tone\" markers, while the _sm_ identification class includes the\n> remaining tone markers and other syllable modifiers.\n\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nK\t= _ra_ _asat_ _halant_\nMed\t= _my_? _mr_? _mw_? _mh_? _asat_?\nVmain\t= _matrapre_* _matraabove_* _matrabelow_* _a_* (_db_ _asat_?)?\nVpost\t= _matrapost_ _mh_? _asat_* _matraabove_* _a_* (_db_ _asat_?)?\nPwo\t= _pt_ _a_* _db_? _asat_?\nTcomplex= _asat_* Med Vmain Vpost* Pwo* _v_* Z?\nTail\t= _halant_ | Tcomplex\n```\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(K | _cs_)? (C | _vowel_ | _d_ | _gb_) _vs_? (_halant_ (C | _vowel_) _vs_?)* Tail\n```\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\nA sequence that does not match any of these expressions should be\nregarded as broken. The shaping engine may make a best-effort attempt\nto shape the broken sequence, but making guarantees about the\ncorrectness or appearance of the final result is out of scope for this\ndocument.\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Kinzi\"</samp>-forming sequences, and pre-base-reordering medial consonants.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n<!---\tPOS_RA_TO_BECOME_REPH --->\n\n\n\tPOS_PREBASE_MATRA\n\t\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_BASE_CONSONANT\n\tPOS_AFTER_MAIN\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n<!---\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST --->\n\n<!---\tPOS_FINAL_CONSONANT --->\n<!---\tPOS_SMVD --->\n\n<!--- question: does Myanmar shape handle Vedic signs differently? --->\n<!--- or am I looking at an incomplete version of the reordering --->\n<!--- logic? --->\n<!--- Perhaps SMVD is all just tagged as _POS_AFTER_SUBJOINED --->\n<!--- and captures all tone marks, too? --->\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered in Myanmar script. \n\nThe position names mimic those used in the General Indic shaping\nmodel, for ease of implementation. However, shaping engines are free\nto use any naming scheme they choose.\n\nThe basic positions (left to right) are dependent\nvowels (matras) and consonants positioned before the base\nconsonant (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base\nconsonant (`POS_BASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`),\nand syllable-modifying or Vedic signs (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the base consonant. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which follows.\n\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_BASE_CONSONANT`.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\nVowel-based syllables, standalone sequences, and broken text runs will\nnot have base consonants.\n\nThe algorithm for determining the base consonant is\n\n  - Starting from the beginning of the syllable, move forwards until a\n    `CONSONANT` is found. \n      * If the consonant is part of a <samp>\"Kinzi\"</samp> sequence, move to the\n        next consonant. \n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm considers only `CONSONANT` class consonants, \n\n\n#### Stage 2, step 2: Tag matras ####\n\nSecond, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll right-side and above-base dependent-vowel (matra)\nsigns are tagged `POS_AFTER_SUBJOINED`.\n\nAll below-base dependent-vowel (matra) signs are tagged\n`POS_BELOWBASE_CONSONANT`. \n\nFor simplicity, shaping engines may choose to tag matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all matras correctly tagged\nbefore proceeding to the next step. \n\n#### Stage 2, step 3: Anusvara ####\n\nThird, any `ANUSVARA` marks appearing immediately after a below-base\nvowel sign must be tagged with `POS_BEFORE_SUBJOINED`, so that the\nmarks are reordered to a position immediately before the below-base\nvowel signs.\n\n\n#### Stage 2, step 4: Pre-base-reordering consonants ####\n\nFourth, all pre-base-reordering consonants must be tagged with\n`POS_PREBASE_CONSONANT`. \n\nMyanmar has one pre-base-reordering consonant: <samp>\"Medial Ra\"</samp>.\n\n:::{figure-md}\n![Pre-base-reordering Medial Ra](images/myanmar/myanmar-medial-ra.svg \"Pre-base-reordering Medial Ra\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-medial-ra}\n\nPre-base-reordering Medial Ra\n:::\n\n```{svg-color-toggle-button} myanmar-medial-ra\n```\n\n\n#### Stage 2, step 5: Kinzi ####\n\nFifth, initial <samp>\"Kinzi\"</samp>-triggering sequences that will become <samp>\"Kinzi\"</samp>s\nmust be tagged with `POS_AFTER_MAIN`.\n\nThe sequences are:\n\n  - <samp>\"Ra,Asat,Halant\"</samp>\n  - <samp>\"Nga,Asat,Halant\"</samp>\n  - <samp>\"Mon Nga,Asat,Halant\"</samp>\n\nIn the Myanmar (or Burmese) language, <samp>\"Nga\"</samp> is the only <samp>\"Kinzi\"</samp>-forming\nconsonant. <samp>\"Mon Nga\"</samp> can form a <samp>\"Kinzi\"</samp> in the Mon language, and <samp>\"Ra\"</samp>\ncan form a <samp>\"Kinzi\"</samp> in Sanskrit written with the Myanmar script.\n\n\n#### Stage 2, step 6: Post-base consonants ####\n\nSixth, any remaining non-base consonants that occur after the base\nconsonant must be tagged with `POS_AFTER_MAIN`. Full consonants (of\nclass `CONSONANT`) will be preceded by a <samp>\"Halant\"</samp> glyph. Medial\nconsonants (of class `CONSONANT_MEDIAL`) will not be preceded by a\n<samp>\"Halant\"</samp> glyph. \n\n> Note: <samp>\"Medial Ra\"</samp> should have been tagged with\n> `POS_PREBASE_CONSONANT` in stage 2, step four, and must not be\n> re-tagged in this step.\n\n\n\n#### Stage 2, step 7: Mark tagging ####\n\n<!--- not sure this is done!!! --->\n\nSeventh, all marks must be tagged with the same positioning tag as the\nclosest non-mark character the mark has affinity with, so that they move together\nduring the sorting step.\n\nFor all marks preceding the base consonant, the mark must be tagged\nwith the same positioning tag as the closest preceding non-mark\nconsonant.\n\nFor all marks occurring after the base consonant, the mark must be\ntagged with the same positioning tag as the closest subsequent consonant.\n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\n\nWith these steps completed, the syllable can be sorted into the final sort order.\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed:\n\n\tlocl\n\tccmp\n\trphf \n\tpref \n\tblwf \n\tpstf\n\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n:::{figure-md}\n![Local-forms substitution](images/myanmar/myanmar-locl.svg \"Local-forms substitution\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-locl}\n\nLocal-forms substitution\n:::\n\n```{svg-color-toggle-button} myanmar-locl\n```\n\n\n#### Stage 3, step 2: ccmp ####\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of glyphs. \n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n> Note: `ccmp` usage is uncommon in Myanmar fonts. Nevertheless,\n> shaping engines must apply any `ccmp` substitutions if they are\n> present in the active font.\n\n\n#### Stage 3, step 3: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Kinzi\"</samp>-triggering sequences with\nthe <samp>\"Kinzi\"</samp> glyph. The sequences are:\n\n  - <samp>\"Ra,Asat,Halant\"</samp>\n  - <samp>\"Nga,Asat,Halant\"</samp>\n  - <samp>\"Mon Nga,Asat,Halant\"</samp>\n\nIn the Myanmar (or Burmese) language, <samp>\"Nga\"</samp> is the only <samp>\"Kinzi\"</samp>-forming\nconsonant. <samp>\"Mon Nga\"</samp> can form a <samp>\"Kinzi\"</samp> in the Mon language, and <samp>\"Ra\"</samp>\ncan form a <samp>\"Kinzi\"</samp> in Sanskrit written with the Myanmar script.\n\n:::{figure-md}\n![Kinzi composition](/images/myanmar/myanmar-kinzi-nga-1.svg \"Kinzi composition\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-kinzi-nga-1}\n\nKinzi composition\n:::\n\n```{svg-color-toggle-button} myanmar-kinzi-nga-1\n```\n\n\n#### Stage 3, step 4: pref ####\n\nThe `pref` feature replaces pre-base-consonant glyphs with\nany special forms. In Myanmar, this can include variant forms for\n<samp>\"Medial Ra\"</samp> or for the left-side matras <samp>\"Sign E\"</samp> (`U+1031`) or <samp>\"Shan\nSign E\"</samp> (`U+1084`)\n\n:::{figure-md}\n![pref feature application](/images/myanmar/myanmar-pref.svg \"pref feature application\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-pref}\n\npref feature application\n:::\n\n```{svg-color-toggle-button} myanmar-pref\n```\n\n\n#### Stage 3, step 5: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. In Myanmar, this usually means replacing\npost-base-consonant <samp>\"Halant,_Consonant_\"</samp> sequences with subjoined\nforms of the consonant. \n\nHowever, Myanmar includes several other below-base-consonant\nforms, including medial consonants and below-base dependent vowel\n(matra) signs.\n\nThe below-base forms feature is applied only to glyphs occurring after\nthe base consonant. \n\n:::{figure-md}\n![blwf feature application](/images/myanmar/myanmar-blwf.svg \"blwf feature application\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-blwf}\n\nblwf feature application\n:::\n\n```{svg-color-toggle-button} myanmar-blwf\n```\n\n\n#### Stage 3, step 6: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any\nspecial forms. \n\n> Note: `pstf` usage is uncommon in Myanmar fonts, because the script\n> does not employ special post-base forms of consonants. Nevertheless,\n> shaping engines should apply any `pstf` substitutions if they are\n> present in the active font.\n\n\n### Stage 4: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. The order in which these features are applied is not\ncanonical; they should be applied in the order in which they appear in\nthe <abbr title=\"Glyph Substitution table\">GSUB</abbr> table in the font. \n\n\tpres\n\tabvs\n\tblws\n\tpsts\n\tliga\n\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. In Myanmar, this can include stylistic variants\nof left-side dependent vowels (matras) or of <samp>\"Medial Ra\"</samp>. \n\n:::{figure-md}\n![Application of the pres feature](/images/myanmar/myanmar-pres.svg \"Application of the pres feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-pres}\n\nApplication of the pres feature\n:::\n\n```{svg-color-toggle-button} myanmar-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Application of the abvs feature](/images/myanmar/myanmar-abvs.svg \"Application of the abvs feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-abvs}\n\nApplication of the abvs feature\n:::\n\n```{svg-color-toggle-button} myanmar-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. In Myanmar, this can include contextual ligatures\ninvolving below-base dependent vowel marks (matras), medial\nconsonants, or subjoined consonants.\n\n:::{figure-md}\n![Application of the blws feature](/images/myanmar/myanmar-blws.svg \"Application of the blws feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-blws}\n\nApplication of the blws feature\n:::\n\n```{svg-color-toggle-button} myanmar-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants.\n\n\n:::{figure-md}\n![Application of the psts feature](/images/myanmar/myanmar-psts.svg \"Application of the psts feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-psts}\n\nApplication of the psts feature\n:::\n\n```{svg-color-toggle-button} myanmar-psts\n```\n\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Application of the liga feature](/images/myanmar/myanmar-liga.svg \"Application of the liga feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-liga}\n\nApplication of the liga feature\n:::\n\n```{svg-color-toggle-button} myanmar-liga\n```\n\n\n### Stage 5: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied. As with the preceding stage, the order in which these\nfeatures are applied is not canonical; they should be applied in the\norder in which they appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n\tdist\n\tabvm\n\tblwm\n\tmark\n\tmkmk\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` is not mandatory for shaping\n> Myanmar text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\nIn Myanmar text, `dist` is typically used to adjust the space around a\npre-base-reordering <samp>\"Medial Ra\"</samp>, because the <samp>\"Medial Ra\"</samp> codepoint is\nclassified as being of zero width, but is orthographically a glyph\nthat encloses the adjacent letter.\n\n:::{figure-md}\n![Application of the dist feature](/images/myanmar/myanmar-dist.svg \"Application of the dist feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-dist}\n\nApplication of the dist feature\n:::\n\n```{svg-color-toggle-button} myanmar-dist\n```\n\n\nThe `abvm` feature positions above-base glyphs for attachment to base\ncharacters. In Myanmar, this includes <samp>\"Kinzi\"</samp> and <samp>\"Asat\"</samp> in addition\nto tone markers, diacritical marks, above-base dependent vowels\n(matras), and Vedic signs.\n\n:::{figure-md}\n![Application of the abvm feature](/images/myanmar/myanmar-abvm.svg \"Application of the abvm feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-abvm}\n\nApplication of the abvm feature\n:::\n\n```{svg-color-toggle-button} myanmar-abvm\n```\n\n\nThe `blwm` feature positions below-base glyphs for attachment to base\ncharacters. In Myanmar, this includes subjoined consonants as well as\nbelow-base dependent vowels (matras), medial consonants, tone markers,\ndiacritical marks, and Vedic signs.\n\n:::{figure-md}\n![Application of the blwm feature](/images/myanmar/myanmar-blwm.svg \"Application of the blwm feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-blwm}\n\nApplication of the blwm feature\n:::\n\n```{svg-color-toggle-button} myanmar-blwm\n```\n\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Application of the mark feature](/images/myanmar/myanmar-mark.svg \"Application of the mark feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-mark}\n\nApplication of the mark feature\n:::\n\n```{svg-color-toggle-button} myanmar-mark\n```\n \n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n:::{figure-md}\n![Application of the mkmk feature](/images/myanmar/myanmar-mkmk.svg \"Application of the mkmk feature\"){.shaping-demo .inline-svg .greyscale-svg #myanmar-mkmk}\n\nApplication of the mkmk feature\n:::\n\n```{svg-color-toggle-button} myanmar-mkmk\n```\n\n\n## The `<mymr>` shaping model ##\n\nThe older Myanmar script tag, `<mymr>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<mymr>` and some users may still have documents that were written to\ntake advantage of `<mymr>` shaping.\n\nSparse information is available about how the Microsoft Uniscribe\nshaping engine treated `<mymr>` text runs. Documentation from the\nHarfBuzz shaping engine suggests that the Uniscribe `<mymr>` shaper\ndid not perform a significant amount of reordering or application of\nIndic-like <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-nko.md",
    "content": "```{include} /_global.md\n```\n\n# N'Ko script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nN'Ko script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Joining properties](#joining-properties)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The `<nko >` shaping model](#the-nko-shaping-model)\n      - [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)\n      - [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)\n      - [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)\n      - [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)\n      - [Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-5-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 7: Applying the positioning features from <abbr>GPOS</abbr>](#stage-7-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nThe N'Ko script is used to write multiple languages in the Manding\nlanguage family, most commonly Maninka, Dyula, and Bambara. \n\nThe N'Ko script uses features and rules derived from those of the\nArabic script, and OpenType defines N'Ko shaping features with a\nsubset of the features used in [Arabic](opentype-shaping-arabic.md) shaping.\nConsequently, a shaping engine can support N'Ko and Arabic with a\n[single shaping model](opentype-shaping-arabic-general.md).\n\nN'Ko is a joining script that uses inter-word spaces, so each\ncodepoint in a text run may be substituted with one of several\ncontextual forms corresponding to what, if any, characters appear\nbefore and after the codepoint. Most, but not all, letter sequences\njoin; shaping engines must track which positions trigger joining\nbehavior for each letter. \n\nN'Ko is written (and, therefore, rendered) from right to\nleft. Shaping engines must track the directionality of the text run\nwhen scripts of different direction are mixed.\n\nThe N'Ko script tag defined in OpenType is `<nko >`. Because OpenType\nscript tags must be exactly four letters long, the `<nko >` tag\nincludes a trailing space. \n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nN'Ko script. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a N'Ko\ncharacter that is capable of taking a diacritical mark. \n\nThe base characters in N'Ko include both consonants and vowels.\n\n**Kashida** (or **tatweel**) is the term for a glyph inserted into a\nsequence for the purpose of elongating the baseline stroke of a\nletter. Unicode documents use the term \"tatweel\" most frequently,\nwhile OpenType documents use the term \"kashida\" most\nfrequently. Kashidas are typically inserted in order to justify lines\nof text. \n\nIn N'Ko, the kashida character is known as _lajanyalan_.\n\n\n## Glyph classification ##\n\nBecause N'Ko is a joining (or cursive) script, proper shaping of\ntext runs involves identifying the joining behavior of each character,\nthen combining that information with any preceding or subsequent\ncharacters to determine the contextually correct form for display.\n\n### Joining properties ###\n\nN'Ko characters are assigned a `JOINING_TYPE` property in the\nUnicode standard that indicates how they join to adjacent\ncharacters. There are six possible values: \n\n  - `JOINING_TYPE_LEFT` indicates that a character joins with\n    the subsequent character, but does not join with the preceding\n    character. \n\t\n  - `JOINING_TYPE_RIGHT` indicates that a character joins with the\n    preceding character, but does not join with the subsequent character.\t\n\n  - `JOINING_TYPE_DUAL` indicates that a character joins with the\n    preceding character and joins with the subsequent character.\n\t\n  - `JOINING_TYPE_NON_JOINING` indicates that a character does not\n    join with the preceding or with the subsequent character.\n\t\n  - `JOINING_TYPE_TRANSPARENT` indicates that the character does not\n    join with adjacent characters _and_ that the character must be\n    skipped over when the shaping engine is evaluating the joining\n    positions in a sequence of characters. When a\n    `JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,\n    the `JOINING_TYPE` of the preceding character passes\n    through. Diacritical marks are frequently assigned this value. \n\t\n  - `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces\n    the use of joining forms with the preceding and subsequent\n    characters. Kashidas and the Zero Width Joiner (`U+200D`) are both\n    `JOIN_CAUSING` characters.\n  \n\nIn other scripts that use the general Arabic shaping model, letters\nare also assigned to a `JOINING_GROUP` that indicates which\nfundamental character they behave like with regard to joining\nbehavior.\n\nJoining groups are not necessary in `<nko >` text shaping, so every\ncodepoint is assigned to the _null_ `JOINING_GROUP`.\n\n### Mark classification ###\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nN'Ko marks all belong to standard combining classes:\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|           | 220             | Other below-base combining marks   |\n|           | 230             | Other above-base combining marks   |\n:::\n\n\nThe numeric values of these combining classes are used during Unicode\nnormalization.\n\n\nThese classifications are used in the [mark-transient-reordering\nstage](#stage-1-transient-reordering-of-modifier-combining-marks).\n\n\t\t\t\n\t\t\t\n### Character tables ###\n\nSeparate character tables are provided for the NKo block and for other miscellaneous\ncharacters that are used in `<nko >` text runs:\n\n  - [NKo character table](character-tables/character-tables-nko.md#nko-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-nko.md#miscellaneous-character-table)\n\n\nThe tables list each codepoint along with its Unicode general\ncategory and its joining type. For letters, the table lists the\ncodepoint's joining group. For diacritical marks, the table lists the\ncodepoint's mark combining class. The codepoint's Unicode name and an example\nglyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph                        |\n|:----------|:-----------------|:-------------|:--------------|:-----------|:-----------------------------|\n|`U+07D3`   | Letter           | DUAL         | _null_        | _0_        | &#x07D3; Ba                  |\n| | | | | |\n|`U+07EB`   | Mark [Mn]        | TRANSPARENT  | _null_        | 230        | &#x07EB; Combining Short High Tone|\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof N'Ko text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text documents may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (which is not common in N'Ko but can occur in\n    other scripts)\n\n\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n\nBy default, OpenType shaping reorders sequences of adjacent marks by\nsorting the sequence on the marks' Canonical_Combining_Class (<abbr>Ccc</abbr>)\nvalues. The presence of a <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr> character within a sequence of marks has\nthe effect of splitting the sequence into two sequences of marks and,\ntherefore, halting any mark-reordering that would have occurred\nbetween the marks on either side of the <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr>.\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence <samp>\"_Letter_,ZWJ\"</samp>\nwould be used. To show the medial form of a letter in isolation, the\nsequence <samp>\"ZWJ,_Letter_,ZWJ\"</samp> would be used.\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is primarily used to prevent a\ncursive connection between two adjacent characters that would, under\nnormal circumstances, form a join. \n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes. Generally\nspeaking <abbr title=\"Right-to-Left Mark\">RLM</abbr> and <abbr title=\"Left-to-Right Mark\">LRM</abbr> codepoints do not interact with shaping.\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel or diacritical marks and \"Hamza\") in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\n\n## The `<nko >` shaping model ##\n\nProcessing a run of `<nko >` text involves seven top-level stages:\n\n1. Transient reordering of modifier combining marks\n2. Compound character composition and decomposition\n3. Computing letter joining states\n4. Applying the `stch` feature\n5. Applying the language-form substitution features from <abbr>GSUB</abbr>\n6. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n7. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Transient reordering of modifier combining marks ###\n\n<!--- http://www.unicode.org/reports/tr53/tr53-1.pdf --->\n> Note: because N'Ko does not feature the \"Shadda\" mark or any\n> marks that belong to _Modifier Combining Marks_ (<abbr>MCM</abbr>) classes, this\n> stage should not involve any additional work when processing\n> `<nko >` text runs. It is included here to maintain consistency with\n> other scripts that utilize the general Arabic-based shaping model.\n\nSequences of adjacent marks must be reordered so that they appear in\nthe appropriate visual order before the mark-to-base and mark-to-mark\npositioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nIn particular, those marks that have strong affinity to the base\ncharacter must be placed closest to the base.\n\nThis mark-reordering operation is distinct from the standard,\ncross-script mark-reordering performed during Unicode\nnormalization. The standard Unicode mark-reordering algorithm is based\non comparing the _Canonical_Combining_Class_ (<abbr>Ccc</abbr>) properties of mark\ncodepoints, whereas this script-specific reordering utilizes the\n_Modifier_Combining_Mark_ (<abbr>MCM</abbr>) subclasses specified in the\ncharacter tables.\n\nThe algorithm for reordering a sequence of marks is:\n\n  - First, move any <samp>\"Shadda\"</samp> (combining class `33`) characters to the\n    beginning of the mark sequence.\n\t\n  -\tSecond, move any subsequence of combining-class-`230` characters that begins\n       with a `230_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters. The subsequence must be moved\n       as a group.\n\n  - Finally, move any subsequence of combining-class-`220` characters that begins\n       with a `220_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters and before all class-`230`\n       characters. The subsequence must be moved as a group.\n\n> Note: Unicode describes this mark-reordering operation, the Arabic\n> Mark Transient Reordering Algorithm (<abbr>AMTRA</abbr>), in Technical Report 53,\n> which describes it in terms that are distinct from standard,\n> <abbr>Ccc</abbr>-based mark reordering.\n>\n> Specifically, <abbr title=\"Arabic Mark Transient Reordering Algorithm\">AMTRA</abbr> is designated as an operation performed during\n> text rendering only, which therefore does not impact other\n> Unicode-compliance issues such as allowable input sequences or text\n> encoding.\n>\n> However, shaping engines may choose to perform the reordering of\n> modifier combining marks in conjunction with their Unicode\n> normalization functionality for increased efficiency.\n\n\n### Stage 2: Compound character composition and decomposition ###\n\nThe `ccmp` feature allows a font to substitute\n\n - mark-and-base sequences with a pre-composed glyph including both\n    the mark and the base (as is done in with a ligature substitution)\n\t\n  - individual compound glyphs with the equivalent sequence of\n    decomposed glyphs (such as decomposing a letter with inherent\n    marks into a separate fundamental-letter glyph followed by an\n    marks-only glyph, to permit more precise positioning) \n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n### Stage 3: Computing letter joining states ###\n\nIn order to correctly apply the initial, medial, and final form\nsubstitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr> during stage 6, the shaping engine must\ntag every letter for possible application of the appropriate feature.\n\n> Note: The following algorithm includes rules for processing `<syrc>`\n> text in addition to `<nko >` text. Implementers concerned only with\n> shaping `<nko >` text can omit the portions for `<syrc>`-specific\n> rules. \n\nTo determine which feature is appropriate, the shaping engine must\nexamine each word in turn and compute each letter's joining state from\nthe letter's `JOINING_TYPE` and the `JOINING_TYPE` of the\npreceding character (if any).\n\n> Note: Although N'Ko uses inter-word spaces, the `init` feature\n> does _not_ refer to word-initial letters only and the `fina` feature\n> does _not_ refer to word-final letters only.\n>\n> Rather, both of these terms are defined with respect to whether or\n> not the preceding and subsequent letters form joins with the current\n> letter. The letters at word boundaries will, naturally, take on\n> initial and final forms, but initial and final forms of letters also\n> occur regularly within words, when the letter in question is\n> adjacent to a letter than does not form joins.\n\nThis computation starts from the first letter of the word, temporarily\ntagging the letter for `isol` substitution. If the first\nletter is the only letter in the word, the `isol` tag will remain unchanged.\n\nFrom here, the algorithm consumes each character in the string, one at\na time, keeping track of the JOINING_TYPE of the previous character. \n\nIf the current character is JOINING_TYPE_TRANSPARENT, move on to the next\ncharacter but preserve the currently-tracked JOINING_TYPE at its previous state.\n\nIf the preceding character's JOINING_TYPE is LEFT, DUAL, or\nJOIN_CAUSING:\n  - In `<syrc>` text, if the current character is <samp>\"Alaph\"</samp>, tag the\n    current character for `med2`, then update the tag for the\n    preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n  - If the current character's JOINING_TYPE is RIGHT, DUAL, or\n    JOIN_CAUSING, tag the current character for `fina`, then update\n    the tag for the preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n\nOtherwise, tag the current character for `isol`.\n\nAfter testing the final character of the word, if the text is in `<syrc>` and\nif the last character that is not JOINING_TYPE_TRANSPARENT or\nJOINING_TYPE_NON_JOINING is <samp>\"Alaph\"</samp>, perform an additional test:\n  - If the preceding character is JOINING_TYPE_LEFT, tag the current character\n    for `fina`\n  - If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current\n    character for `fin3`\n  - Otherwise, tag the current character for `fin2`\n\n\nOnce the last character of the word has been processed, proceed to the\nnext word and repeat the algorithm, starting at the beginning of the\nnext word.\n\n> Note: Because the processing of the characters in the algorithm\n> described above is deterministic, shaping engines may choose to\n> implement the joining-state computation as a state machine, in a lookup\n> table, or by any other means desirable.\n\n\nAt the end of this process, all letters should be tagged for possible\nsubstitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or\n`fin3` features.\n\n### Stage 4: Applying the `stch` feature ###\n\nThe `stch` feature decomposes and stretches special marks that are\nmeant to extend to the full width of words to which they are\nattached. It was defined for use in `<syrc>` text runs for the \"Syriac\nAbbreviation Mark\" (`U+070F`) but it can be used with similar marks in\nother scripts.\n\n> Note: N'Ko does not feature marks that require the `stch` feature;\n> it is described here to maintain compatibility with other scripts\n> that use the general Arabic shaping model.\n\nTo apply the `stch` feature, the shaping engine should first decompose the\n`U+070F` glyph into components, which results in a beginning point,\nmidpoint, and endpoint glyphs plus one (or more) extension glyphs: at\nleast one extension between the beginning and midpoint glyphs and at\nleast one extension between the midpoint and endpoint glyphs. \n\nThe shaping engine must then calculate the total length of the word to\nwhich the mark applies. That length, minus the advance widths of the\nbeginning, middle, and endpoint glyphs of the mark, must be divided by\ntwo. \n\nThe result, divided by the advance width of the extension glyph\nand rounded up to the next integer, tells the shaping engine how many\ncopies of the extension glyph must be placed between the midpoint and\neach end of the mark.\n\nFollowing this procedure ensures that the same number of extensions is\nused on each side of the mark so that it remains symmetrical.\n\nFinally, the decomposed mark must be reordered as follows: \n\n  - All of the glyphs in the sequence for the mark, _except_ for\n    the final glyph, are repositioned as a group so that they precede\n    the word to which the mark is attached.\n  - The final glyph in the mark sequence is repositioned to the end of\n    the word.\n\t\n\n### Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the N'Ko shaping model:\n\n\tlocl\n\tisol\n\tfina\n\tfin2 (not used in N'Ko)\n\tfin3 (not used in N'Ko)\n\tmedi\n\tmed2 (not used in N'Ko)\n\tinit\n\trlig (not used in N'Ko)\n\trclt (not used in N'Ko)\n\tcalt\n\t\n> Note: `rlig` and `calt` need to be appled to the word as a whole before\n> continuing to the next feature.\n\n#### Stage 5, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n<!--- ![Localized form substitution](/images/nko/nko-locl.svg) --->\n\n\n#### Stage 5, step 2: isol ####\n\nThe `isol` feature substitutes the default glyph for a codepoint with\nthe isolated form of the letter.\n\n> Note: It is common for a font to use the isolated form of a letter\n> as the default, in which case the `isol` feature would apply no\n> substitutions. However, this is only a convention, and the active\n> font may use other forms as the default glyphs for any or all\n> codepoints.\n\n<!--- ![Isolated form substitution](/images/nko/nko-isol.svg) --->\n\n\n#### Stage 5, step 3: fina ####\n\nThe `fina` feature substitutes the default glyph for a codepoint with\nthe terminal (or final) form of the letter.\n\n:::{figure-md}\n![Final form substitution](/images/nko/nko-fina.svg \"Final form substitution\"){.shaping-demo .inline-svg .greyscale-svg #nko-fina}\n\nFinal form substitution\n:::\n\n```{svg-color-toggle-button} nko-fina\n```\n\n\n#### Stage 5, step 4: fin2 ####\n\nThis feature is not used in `<nko >` text.\n\n#### Stage 5, step 5: fin3 ####\n\nThis feature is not used in `<nko >` text.\n\n#### Stage 5, step 6: medi ####\n\nThe `medi` feature substitutes the default glyph for a codepoint with\nthe medial form of the letter.\n\n:::{figure-md}\n![Medial form substitution](/images/nko/nko-medi.svg \"Medial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #nko-medi}\n\nMedial form substitution\n:::\n\n```{svg-color-toggle-button} nko-medi\n```\n\n\n#### Stage 5, step 7: med2 ####\n\nThis feature is not used in `<nko >` text.\n\n#### Stage 5, step 8: init ####\n\nThe `init` feature substitutes the default glyph for a codepoint with\nthe initial form of the letter.\n\n:::{figure-md}\n![Initial form substitution](/images/nko/nko-init.svg \"Initial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #nko-init}\n\nInitial form substitution\n:::\n\n```{svg-color-toggle-button} nko-init\n```\n\n\n#### Stage 5, step 9: rlig ####\n\nThis feature is not used in `<nko >` text.\n\n\n\n#### Stage 5, step 10: rclt ####\n\nThis feature is not used in `<nko >` text.\n\n\n\n#### Stage 5, step 11: calt ####\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe `calt` feature, in contrast to `rclt` above, performs\nsubstitutions that are not mandatory for orthographic\ncorrectness. However, unlike `rclt`, the substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\n<!--- ![Contextual alternate substitution](/images/nko/nko-calt.svg) --->\n\n\n\n### Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the N'Ko shaping model:\n\n    liga\n\tdlig\n\tcswh (not used in N'Ko)\n\tmset (not used in N'Ko)\n\t\n\n#### Stage 6, step 1: liga ####\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n<!--- ![Standard ligature substitution](/images/nko/nko-liga.svg) --->\n\n\n\n#### Stage 6, step 2: dlig ####\n\nThe `dlig` feature substitutes additional optional ligatures that are\noff by default. Substitutions made by `dlig` may be disabled by\napplication-level user interfaces.\n\n\n#### Stage 6, step 3: cswh ####\n\nThis feature is not used in `<nko >` text.\n\n\n\n#### Stage 6, step 4: mset ####\n\nThis feature is not used in `<nko >` text.\n\n\n### Stage 7. Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    curs (not used in N'Ko)\n\tkern\n\tmark\n\tmkmk\n\n#### 7.1 `curs` ####\n\n\nThis feature is not used in `<nko >` text.\n\n\n#### 7.2 `kern` ####\n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\n\n#### 7.3 `mark` ####\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Mark positioning](/images/nko/nko-mark.svg \"Mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #nko-mark}\n\nMark positioning\n:::\n\n```{svg-color-toggle-button} nko-mark\n```\n\n\n#### 7.4 `mkmk` ####\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n\n"
  },
  {
    "path": "opentype-shaping-normalization.md",
    "content": "# Normalization in OpenType shaping #\n\n## Unicode normalization ##\n\nUnicode defines algorithms for normalizing a sequence of input\ncodepoints into either a canonical composed form or a canonical\ndecomposed form. The purpose of these algorithms and of the defined\nnormalization forms is to generate equivalent representations of input\nsequences regardless of variations in the order of the input sequences.\n\nFor example, a base letter with an attached mark might exist in\nUnicode as a single codepoint, but an input sequence might consist of\nthe base letter codepoint followed by the combining mark\ncodepoint. Unicode normalization can be used to determine that the\n<samp>\"Letter, Mark\"</samp> sequence is equivalent to the single codepoint. This\nsimplifies sorting, searching, string comparison, and many other common\ntasks.\n\nOpenType shaping utilizes Unicode normalization, but OpenType\nshaping has a distinctly different goal: to select the best or most\nappropriate representation of the input codepoint sequence that is\navailable in the active font.\n\n\n### Unicode equivalence and decomposition\n\nUnicode defines two levels of _equivalence_: \"canonical equivalence\"\nand \"compatibility equivalence.\"\n\nBoth of these equivalence relationships are stored as\n`Decomposition_Mapping` properties for codepoints in the Unicode\nCharacter Database. In a canonical equivalence relationship, a\ncodepoint will have a `Decomposition_Mapping` that lists either one or\ntwo other codepoints. In a compatibility equivalence relationship, a\ncodepoint will instead have a `Decomposition_Mapping` that starts with\na formatting tag which is followed by either one or two other\ncodepoints.\n\n> Note: Decomposition mappings typically map one input codepoint to\n> two output codepoints.\n> \n> Decomposition mappings that produce one output codepoint are rare\n> and are defined in order to handle particular, uncommon encoding\n> circumstances. However, because such mappings exist, shaping engines\n> should not assume that all decomposition mappings produce exactly\n> two output codepoints.\n\nFor shaping purposes, canonical equivalence is generally of greatest\nconcern. Canonical equivalence defines that sequences such as\n<samp>\"Letter,Mark\"</samp> (a standalone base character followed by a\ncombining-mark character) are to be treated the same as <samp>\"Letter-with-mark\"</samp> (a\ncodepoint that includes both the base and the mark).\n\nThe canonical `Decomposition_Mapping`s are required for Unicode\nnormalization and, even outside of the Unicode normalization\nalgorithm, help shaping engines make the correct matches between\ncodepoint sequences and glyphs.\n\nCompatibility equivalence is more akin to defining fallback\nrelationships, such as defining that a superscript numeral has the\nsame underlying meaning as the full-size numeral. If the active font\nhas no glyph for the superscript numeral codepoint, any decision as to\nwhether substituting the full-size numeral glyph, artifically scaling\nthe full-size numeral glyph, or displaying a `.notdef` glyph is the \ndesirable output is more likely to be a question left up to the\napplication layer or to the end user, rather than to be handled by the\nshaping engine.\n\nHowever, there may be compatibility equivalence relationships of\nsignificant interest to shaping engines or to other components of a\ntext-rendering stack. For example, the Arabic Presentation Form\ncodepoints have defined compatibility equivalences that maps each one\nto a codepoint in the Arabic block. Therefore, this information can be\nused to enable fallback support for shaping older documents that\ninclude Arabic Presentation Form text runs.\n\n\n### Unicode normalization forms\n\nUnicode defines four \"normalization forms,\" two of which are focused\non canonical equivalence and two of which are focused on compatibility\nequivalence.\n\nThe canonical equivalence forms are:\n\n  - Normalization Form D = `NFD`\n    - All codepoints have gone through full, recursive canonical\n      decomposition\n  - Normalization Form C = `NFC`\n    - All codepoints have gone through full, recursive canonical\n      decomposition, followed by full canonical composition\n\nThe compatibility equivalence forms are:\n\n  - Normalization Form KD = `NFKD`\n    - All codepoints have gone through full, recursive canonical\n      decomposition and full, recursive compatibility decomposition\n  - Normalization Form KC = `NFKC`\n    - All codepoints have gone through full, recursive canonical\n      decomposition and full, recursive compatibility decomposition,\n      followed by full canonical composition\n\n\n### Unicode canonical combining classes\n\nThe Unicode `Canonical_Combining_Class` (`Ccc`) property holds a\nnumerical value for every codepoint. It can be used to sort sequences\ninto canonical order.\n\nBase letters, other non-mark codepoints, and spacing mark codepoints\nwill have `Ccc` of `0`, meaning that the codepoint is unaffected by\nthe reordering algorithm.\n\nCombining marks can have `Ccc` values from `1` to `254`. The\nreordering algorithm sorts subsequences of adjacent marks into order\nof increasing `Ccc` values.\n\n\n### Unicode normalization algorithm\n\nThe general Unicode normalization algorithm is structured to produce\noutput in the user's preference between the four normalization\nforms. So the steps performed vary based on whether the desired output\nis to be in form `NFD`, `NFC`, `NFKD`, or `NFKC`.\n\n> Note: The end goal of OpenType shaping normalization is not to\n> produce these Unicode-specified normalization forms, but to produce\n> the optimal rendered output. That is why a modified normalization\n> algorithm, as described in the next section, is used for shaping\n> text.\n\nThe general Unicode normalization algorithm applies to all text except\nHangul syllables. It involves three stages:\n\n1. Full decomposition:\n  - If `NFD` or `NFC` is the desired output, recursively apply\n    canonical decomposition mappings\n  - If `NFKD` or `NFKC` is the desired output, recursively apply\n    canonical decomposition mappings followed by compatibility\n    decomposition mappings\n\n2. Canonical reordering:\n  - Sort all subsequences that consist of `Ccc` &gt; `0` codepoints\n    into order of increasing `Ccc` value\n\n3. Recomposition, if desired:\n  - If either `NFD` or `NFKD` is the desired output, stop.\n  - If either `NFC` or `NFKC` is the desired output, apply canonical\n    recomposition\n   \nCanonical recomposition segments the text run into chunks that begin\nwith <samp>\"Starter\"</samp> codepoints (which have `Ccc` = `0`) and progressively\ntests the subsequent codepoints in the chunk, recombining them, in\norder, with the starter whenever all of the following is true:\n  - there is a canonical `Decomposition_Mapping` for the\n    <samp>\"Starter,Subsequent_codepoint\"</samp> pair\n  - the codepoint of the canonical `Decomposition_Mapping` does not\n    have the `Composition_Exclusion` or `Full_Composition_Exclusion`\n    properties\n  - there are no characters of `Ccc` = `0` or of a higher `Ccc` value\n    than the starter between the starter and the subsequent codepoint\n\t\nIn conceptual terms, the recomposition algorithm applies the reverse\nof the decomposition mappings, except that the now-reordered sequence\nmay enable different pairings to match first.\n\nThe additional test conditions enable pairs to potentially match on\nseveral decomposition mappings in a sequence where one base is\nfollowed by several combining marks that attach at different\npositions.\n\nFor example, in the fully decomposed and reordered sequence\n<samp>\"Letter,Mark_1,Mark_2\"</samp>, if <samp>\"Letter,Mark_1\"</samp>\nis not part of a canonical \n`Decomposition_Mapping` but <samp>\"Letter,Mark_2\"</samp> is part of a canonical\n`Decomposition_Mapping`, then <samp>\"Letter,Mark_2\"</samp> will recombine into\n<samp>\"Letter-and-Mark_2\"</samp>, followed by <samp>\"Mark_1\"</samp>.\n\n\n### Unicode normalization for Hangul syllables\n\nHangul syllables can be algorithmically composed and decomposed\nbecause of the strict jamo-ordering of the codepoints that make up the\nHangul Syllables block.\n\nShaping engines can can use these algorithms to compose sequences of\nindividual jamo codepoints into precomposed-syllable codepoints, or to\ncompose individual jamo glyphs into a composite syllable when the\nactive font does not include a precomposed glyph for the required\nsyllable.\n\nThe algorithm used to normalize Hangul syllables is not related to the\nUnicode normalization algorithm used for other scripts. The Hangul\nalgorithm is described in stage 2 of the [Hangul\nshaping](opentype-shaping-hangul.md#stage-2-determining-if-the-syllable-can-be-composed-into-a-hangul-syllables-codepoint) document.\n\n\n\n## OpenType shaping normalization ##\n\nNormalization for OpenType shaping closely follows the Unicode\nnormalization model, but it takes place in the context of a known text\nrun and a specific active font.\n\nAs a result, OpenType shaping takes the text context and available\nfont contents into account, making decisions intended to result in the\nbest possible output to the shaping process.\n\n\n### Goals ###\n\nThe OpenType shaping normalization algorithm also decomposes and\nreorders the codepoints in a text run. But it differs from Unicode\nnormalization, particularly at the recomposition stage, in order to\noffer the following features useful for shaping engines:\n\n1. Different shaping models can request different preferred formats\n   (composed or decomposed) as output\n2. Individual decomposition and recomposition mappings will not be\n   applied if doing so would result in a codepoint for which the\n   active font does not provide a glyph\n3. Additional decompositions and recompositions not included in\n   Unicode are supported, including the decomposition of multi-part\n   dependent vowels (matras) in several Indic and Brahmic-derived\n   scripts as well as arbitrary decompositions and compositions\n   implemented in `ccmp` and `locl` <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups\n\n\n### Shaping model preferences ###\n\nEach shaping model supported by an OpenType shaping engine should\nrequest its preferred normalization form: either fully composed or\nfully decomposed.\n\n> Note: in both cases, the preferred normalization form should be\n> understood as considering only canonical decomposition mappings, not\n> compatibility decomposition mappings.\n\nWhich form is preferred for the model primarily depends on the details\nof the model, such as whether or not generic Unicode recomposition is\nknown to interfere with mark positioning, reordering, or other shaping\noperations.\n\nComplex shaping models, particularly those which may involve\nreordering or the positioning of multi-part marks, tend to prefer\ndecomposed forms. Nevertheless, deciding which form is preferred for\nwhich model is an implementation decision ultimately left up to the\nshaping-engine implementor, who can take speed, complexity, and other\ntrade-offs into account.\n\nThe preferred form may also be specific to a language, such as when a\nminority language employs different diacritic ordering than the\nordering encoded in Unicode's <abbr>Ccc</abbr> data. In this case, a font\ntargetting the minority language may be expected to handle\nlanguage-specific mark-to-mark positioning in <abbr title=\"Glyph Positioning table\">GPOS</abbr>; as a result, the\nshaping engine should allow for the positioning lookups by designating\na preference for decomposed forms.\n\nAlthough a generic Unicode normalization implementation would target\nthe forms defined in Unicode (`NFD`, `NFC`, `NFKD`, or `NFKC`),\nOpenType shaping preferred forms are not identical to these Unicode\nforms and should not be advertized as being functionally equivalent.\n\nScripts and languages may also benefit from defining other preferred\nforms beyond \"fully decomposed\" and \"fully recomposed.\" For example,\nit might be useful to define a preferred form in which all sequences\nof marks are recomposed, but base-and-mark sequences are not\nrecomposed.\n\n\n### OpenType shaping normalization algorithm ###\n\nOpentype shaping normalization consists of four main stages.\n\n1. Full decomposition\n2. Canonical reordering\n3. Selective recomposition\n4. Applying font-specific normalization features\n\nDistinctions from Unicode normalization at each stage are described\nbelow.\n\n\n#### Stage 1: Full decomposition ####\n\nIn the first stage, full `NFD` decomposition is performed, as in\nUnicode normalization, except for a small set of exceptions required\nby specific shapers:\n\n  - recursively apply canonical decomposition mappings, except for:\n      - Devanagari <samp>\"Rra\"</samp>\n\t  - Bengali <samp>\"Rra\"</samp> and <samp>\"Rha\"</samp>\n\t  - Tamil <samp>\"Au\"</samp>\n\nAfter this decomposition, a second set of non-canonical and non-Unicode\nmappings is applied:\n\n  - Several scripts (including many covered in the Indic2 shaping\n    model, as well as several other Brahmic-derived scripts) include\n    multi-part dependent vowel (matra) characters that should be\n    decomposed into multiple glyphs, so that those glyphs can be\n    independently positioned around base letters.\n\t\n\tThese additional decompositions are listed in the individual\n\tscript-shaping documents.\n\t\n  - Shaping engines implementing fallback support for older encodings\n    should remap those older codepoints to their updated values.\n    For example, a shaper that supports text using the Arabic\n    Presentation Forms block should remap the Arabic Presentation\n    Forms codepoints to the corresponding Arabic-block default\n    codepoints and <abbr title=\"Glyph Substitution table\">GSUB</abbr> positional features.\n\t\n\tThese substitutions are defined in a set of Unicode compatibility\n    decomposition mappings.\n\n  - Certain punctuation and symbol codepoints should be remapped, such as\n    remapping \"non-breaking hyphen\" codepoints to \"hyphen\".\n  \nSome of these additional decompositions and mappings may also be\nimplemented in and active font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, but that is not\nguaranteed. Consequently, a normalization function must implement them\nin order to fulfill the goal of providing stable output.\n\n\n#### Stage 2: Canonical reordering ####\n\nIn the second stage, mark sequences are reordered into canonical\norder:\n\n  - Sort all subsequences that consist of `Ccc` &gt; `0` codepoints\n    into order of increasing `Ccc` value\n\nSeveral script-specific shapers require additional reordering to\ncompensate for limitations in the Unicode <abbr>Ccc</abbr> mark-reordering\nmodel. For example, several Arabic mark sequences are reordered in\n[stage 1](opentype-shaping-arabic.md#stage-1-transient-reordering-of-modifier-combining-marks) of the Arabic\nshaping model and [stage 1](opentype-shaping-syriac.md#stage-1-transient-reordering-of-modifier-combining-marks)\nof the Syriac shaping model.\n\nThese are listed briefly in stage 4, step 4, below, but full\ndiscussion of each case can be found in each script's shaping\ndocument.\n\n\n\n#### Stage 3: Selective recomposition ####\n\nThe recomposition stage is selective and depends on the form requested\nby the shaping model in use:\n\n  - If the shaping model prefers composed forms, then proceed with\n    recomposition as described in stage 3, step 1\n\n  - If the shaping model prefers decomposed forms, then proceed with\n    the recomposition as described in stage 3, step 2\n\t\n\n##### Stage 3, step 1: Recomposition for composed-form preference #####\n\nIf composed forms have been requested, then proceed as in the Unicode\ncanonical recomposition algorithm: segment the text run into chunks\nthat begin with <samp>\"Starter\"</samp> codepoints (which have `Ccc` = `0`) and\nprogressively tests the subsequent codepoints in the chunk,\nrecombining them, in order, with the starter whenever all of the\ntest conditions are met.\n\nThe following test conditions must be true:\n  - there is a canonical `Decomposition_Mapping` for the\n    <samp>\"Starter,Subsequent_codepoint\"</samp> pair \n  - the codepoint of the canonical `Decomposition_Mapping` does not\n    have the `Composition_Exclusion` or `Full_Composition_Exclusion`\n    properties\n  - there are no characters of `Ccc` = `0` or of a higher `Ccc` value\n    than the starter between the starter and the subsequent codepoint\n  - the starter and the subsequent codepoint are not both of `Ccc` = `0`\n  - the glyph that results from applying the recomposition exists in\n    the active font\n\n\n##### Stage 3, step 2: Recomposition for decomposed-form preference #####\n\nIf decomposed forms have been requested, then a simple check is\nperformed to cope with any decomposed forms that are absent in the\nactive font.\n\nSegment the text run into chunks that begin with <samp>\"Starter\"</samp> codepoints\n(which have `Ccc` = `0`) and progressively tests the subsequent\ncodepoints in the chunk. \n\n- If there is no standalone glyph for the subsequent codepoint, but\n  there is a `Decomposition_Mapping` for the\n  <samp>\"Starter,subsequent_codepoint\"</samp> pair and a glyph exists for the\n  recomposed codepoint, \n  then recombine the starter and the subsequent codepoint\n\n<!---\nHARFBUZZ logic here: https://github.com/harfbuzz/harfbuzz/src/hb-ot-shape-normalize.cc#L425\n--->\n\n\n\n#### Stage 4: Normalization-related <abbr title=\"Glyph Substitution table\">GSUB</abbr> features and other font-specific considerations ####\n\nAfter the decomposition, mark-reordering, and selective\nrecomposition stages, OpenType shaping normalization also takes\ncertain <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups and complex-script shaping operations into\nconsideration.\n\nThese additional operations may produce final output that differs\nfrom Unicode `NFD` and `NFC` forms. However, the output from stage\nfour should be identical for any two canonically-equivalent input\nsequences in the same active font and script/language context.\n\n> Note: the features discussed below are applied after the completion\n> of the decomposition, mark-reordering, and recomposition\n> stages. Furthermore, they are applied before any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> features.\n> \n> As a result, shaping engine implementors may choose to\n> defer application of these features to the start of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> processing for the sake of convenience.\n\nThe `ccmp` and `locl` features can involve normalization, as described\nbelow. If they are present in the active font and match the text run,\nall `ccmp` and `locl` features should be applied, and should be\napplied in the order in which they are listed in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\n\n##### Stage 4, step 1: ccmp features #####\n\nThe `ccmp` feature is applied to all text runs. `ccmp` lookups are not\nmeant be to be disabled by end users in application code.\n\n`ccmp` lookups can specify arbitrary decomposition mappings and\ncomposition mappings, via one-to-many or many-to-one <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions.\n\nThese lookups should be applied regardless of whether\nthey correspond to the expected decomposition and recomposition\nmappings in Unicode, because `ccmp` is font-specific.\n\nA common usage of `ccmp` is to decompose a single codepoint into two\nor more glyphs representing discrete components, so that those\ncomponents can be more precisely positioned.\n\nFor example, many Arabic letters include ijam: dots that, while they\nmay visually resemble marks, are instead intrinsic components of the\nletter and not diacritics. Because the ijam are not marks, a letter\nwith ijam does not decompose to separate Unicode codepoints. By\ndecomposing the letter into discrete base and ijam glyphs in `ccmp`, a\nfont can implement better contextual positioning of the ijam, and can\ndo so with considerably less work than including numerous alternate\nglyphs.\n\n<!--- comment from the HarfBuzz source code that I am not\n      certain of the meeting of:\n\"When a font has a precomposed character for a sequence but the 'ccmp'\nfeature in the font is not adequate, use the precomposed character\nwhich typically has better mark positioning.\"\n--->\n\n\n##### Stage 4, step 2: locl features #####\n\nThe `locl` feature is applied to text runs based on matching script\nand language tags.\n\nWhen the tags match, any lookups in `locl` are applied by default\nduring shaping, and these lookups are not meant be to be disabled by\nend users in application code.\n\n`locl` lookups often implement simple one-to-one substitutions to\nreplace default glyph forms with alternate shapes preferred in the\nlanguage/script combination.\n\nHowever, `locl` lookups may also interact with normalization by\nperforming decompositions or compositions. These substitutions are\noften used to preserve orthographic or linguistic features that are\nnot fully captured by Unicode normalization forms or <abbr>Ccc</abbr> ordering.\n\nFor example, in the Turkish alphabet, \"dotted i\" and \"dotless i\" are\ntwo distinct letters. For runs of text in Turkish, a font may\ndeliberately substitute a generic \"i\" glyph with \"dotted i\" or the \"i,\ndot diacritic\" sequence with `locl` lookups in order to ensure that\nthe dot diacritic is not lost as text is processed.\n\nOr, for example, in a particular script and language pairing, readers\nmight expect or prefer certain sequences of diacritics to stack in a\ndifferent order than the order their Unicode <abbr>Ccc</abbr> values dictate. A\n`locl` lookup could be used to implement the preferred reordering in a\nmany-to-one <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution.\n\n\n\n##### Stage 4, step 3: Variation Selectors #####\n\nUnicode defines _standardized_variation_sequences_ as sequences of two\ncodepoints where the first codepoint is any base character or mark,\nand the second character is a Variation Selector. Mapping a\nstandardized variation sequence to a glyph is not done via <abbr title=\"Glyph Substitution table\">GSUB</abbr>,\nhowever, but in the `cmap` table of a font.\n\nUnicode normalization does not consider Variation Selector\ncodepoints.\n\nWhen performing OpenType shaping normalization, however, if the\n<samp>\"_letter_,Variation Selector\"</samp> is not mapped to a glyph in the active\nfont, a shaping engine may prefer to drop the Variation Selector\ncodepoint and render the default form of the character or to replace\nthe sequence with a `.notdef` glyph. Which option is preferred may be\nlanguage- or script-specific.\n\n\n#### Stage 4, step 4: Interaction with script-specific shaping models ####\n\nReordering and composition are defined as shaping operations in\nseveral script-specific shaping models. In some cases, a reordering\noperation or composition may be designated by a particular <abbr title=\"Glyph Substitution table\">GSUB</abbr> or\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> feature tag.\n\nShaping-engine implementors should take care to note where completing\nnormalization early in the shaping process may reduce the need for\napplying such operations later.\n\nFor example, in the Indic2 shaping model, sequences of marks are\nreordered in stage 2, step 4. But this reordering is identical to the\nUnicode canonical reordering, so a shaping-engine implementation that\nnormalizes all text runs before starting the Indic2 shaping process\nwill not need to perform any reordering at that step — assuming that\nthe Indic2 shaping model is configured to prefer decomposed forms.\n\nSimilarly, in stage 3, step 2 of the Indic2 shaping model, the `nukt`\nfeature composes <samp>\"Base,Nukta\"</samp> sequences into <samp>\"Base-and-Nukta\"</samp>\nglyphs. A shaping engine that designates the Indic2 shaping model as\npreferring composed forms could, therefore, have such <samp>\"Base,Nukta\"</samp>\nsequences recomposed during Unicode normalization. However, such a\nrecomposition preference would likely cause other problems, such as\nthe unwanted recomposition of multi-part dependent vowels (matras).\n\nScript-specific shaping models can also involve special exceptions to\nthe generic composition and reordering process of normalization. For\nexample:\n\n  - In the Hebrew shaper, stage 2, Hebrew Alphabetic Presentation\n    Forms, if available in the active font, are composed.\n\n  - In the Arabic shaping model, stage 1, and in the Syriac shaping\n    model, stage 1, certain marks are reordered after normalization\n    and after <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature application.\n\n  - In Bengali, <samp>\"Ya,Nukta\"</samp> is composed into <samp>\"Yya\"</samp> before <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature\n    application, to avoid potential ambiguities during the application\n    of later features.\n\n\n#### Compatibility decompositions ####\n\nAs was mentioned in stage 1 of the OpenType shaping normalization\nalgorithm, the codepoints in the Arabic Presentation Forms blocks\nhave Unicode compatibility `Decomposition_Mapping`s that a shaping\nengine can use to map codepoints from Arabic Presentation Forms to\ncodepoints in the Arabic block. Each Arabic Presentation Form\n`Decomposition_Mapping` is tagged with a positional tag corresponding\nto a positional <abbr title=\"Glyph Substitution table\">GSUB</abbr> feature: `<final>`, `<initial>`,`<isolated>`, or\n`<medial>`.\n\nThis tag information can be used to construct a set of synthetic <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nlookups corresponding to `fina`, `init`, `isol`, and `medi`. However,\nshaping engines should take care not to offer guarantees about the\nexpect output, unless explicit support for older files known to be\nencoded with Arabic Presentation Forms codepoints is desired.\n\nSimilarly, several other compatibility `Decomposition_Mapping` tags\ncould theoretically be exploited to enable some level of fallback\nsupport for shaping codepoints when the necessary glyphs are missing\nin the active font, such as mapping `<fraction>` decompositions to\n`frac`, `<super>` decompositions to `sups`,  `<sub>` to `subs` or\n`sinf`, or `<compat>` to various generic list-item delimiter\nsequences.\n\nAll such decompositions, however, should be implemented as\nfallbacks and the decision to employ them is best left up to the\napplication layer or end user's preferences.\n"
  },
  {
    "path": "opentype-shaping-oriya.md",
    "content": "```{include} /_global.md\n```\n\n# Oriya shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Oriya script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Oriya character tables](#oriya-character-tables)\n  - [The `<ory2>` shaping model](#the-ory2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<orya>` shaping model](#the-orya-shaping-model)\n      - [Distinctions from `<ory2>`](#distinctions-from-ory2)\n      - [Advice for handling fonts with `<orya>` features only](#advice-for-handling-fonts-with-orya-features-only)\n      - [Advice for handling text runs composed in `<orya>` format](#advice-for-handling-text-runs-composed-in-orya-format)\n\n\n## General information ##\n\nThe Oriya or Odia script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. Oriya is\ndistinctive in some respects because it includes both some features\ncommon to the North Indic subgroup and some features common to the\nSouth Indic subgroup.\n\nThe Oriya script is used to Oriya (or Odia) language. In addition,\nSanskrit may be written in Oriya, so Oriya script runs may include\nglyphs from the Vedic Extensions block of Unicode. \n\nThere are two extant Oriya script tags defined in OpenType, `<orya>`\nand `<ory2>`. The older script tag, `<orya>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<ory2>`\nshaping model. However, if a font is encountered that supports only\n`<orya>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nmark. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Oriya\nlanguage, this sign is known as the _halanta_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Oriya\nlanguage, this mark is known as the _candrabindu_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nDifferent <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Oriya text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nOriya glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Oriya character tables ###\n\nSeparate character tables are provided for the Oriya and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<ory2>` text runs:\n\n  - [Oriya character table](character-tables/character-tables-oriya.md#oriya-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-oriya.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-oriya.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0B01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0B01; Candrabindu         |\n| | | | |\n|`U+0B15`   | Letter           | CONSONANT         | _null_                     | &#x0B15; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Oriya text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of Oriya text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<ory2>` shaping model ##\n\nProcessing a run of `<ory2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Oriya's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_AFTER_MAIN` = <samp>\"Reph\"</samp> is ordered immediately after the\n     base consonant or syllable base.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_MAIN`  = Above-base matras are\n    ordered immediately after the base consonant or syllable base.\n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are\n     ordered after all subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Oriya consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Oriya Unicode block enumerates four modifier signs,\n> \"Candrabindu\" (`U+0B01`), \"Anusvara\" (`U+0B02`), \"Visarga\" \n> (`U+0B03`), and \"Avagraha\" (`U+0B3D`). In addition, Sanskrit text\n> written in Oriya may include additional signs from Vedic Extensions\n> block.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that vowel is the\nsyllable's only vowel sound and, by definition, there is no \"base\"\nconsonant. \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nNon-base consonants in a valid syllable will be separated by <samp>\"Halant\"</samp>\nmarks. Pre-base consonants will be followed by <samp>\"Halant\"</samp>, while\npost-base consonants will be preceded by <samp>\"Halant\"</samp>.\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\t\nThe algorithm for correctly identifying the base consonant includes a\ntest to recognize these sequences and not mis-identify the base\nconsonant.\n\nAll consonants in Oriya can potentially occur in pre-base\nposition. The <samp>\"Halant\"</samp> marks on pre-base consonants indicate that they\ncarry no vowel. Instead, they affect syllable pronunciation by\ncombining with the base consonant (e.g., \"_thr_\" or \"_spl_\").\n\nTwo consonants in Oriya are allowed to occur in post-base\nposition: <samp>\"Ya\"</samp> and <samp>\"Yya\"</samp>.\n\nOriya consonants take on a variety of different forms in\nconsonant conjuncts. In some cases, the base consonant takes a\nbelow-base or mark-like form.\n\n<!--- \nKA (0B15), JA (0B1C), NA (0B28), BA (0B2C), WA (0B35), LA (0B32), and\nLLA (0B33) are presented in their half-forms.\n\nTA (0B24), DDHA (0B22), THA (0B25),\nCHA (0B1B), BHA (0B2D), MA (0B2E), and NNA (0B23) are rendered as\nconsonant signs placed below consonant letters. These signs retain the\ninherent vowel A. \n\nOnly the sign representing YYA (0B5F) is positioned\nto the right of a consonant.\n\nSome consonant in Oriya are rendered as consonant signs when they\nfunction as part of a consonant cluster. These signs do not have\nvisual similarity with the consonants they represent.\n\nKA      +     TA\t↝ K.TA\nLa + Halant\t +\tTa\t↝ ¦\nDA\t +      MA\t↝ D.MA\n]ç\t +\tc\t↝ ]ê\n\nSuch consonant clusters may function as consonant and can further take\nother consonant as /ma:tra:/ matras. For example,\n\nTA\t+\tSA\t↝ T.SA\t+\tNA\t↝ T.S.NA\n[ç\t+\tj\t↝ júç\t\t+\t_\t↝ júð\n\nDiminutive form of consonants: A diminutive form of consonant is used\nas the final component of a consonant cluster. Such diminutive forms\nretain the inherent vowel A and are positioned below the relevant\nconsonant.\n\nGA\t+\tDHA\t↝ G.DHA\nNç\t+\t^\t↝ ‘\nSHA\t+\tCA\t↝ SH.CA\nhç\t+\tQ\t↝ ¾\n\n\nFrom info at http://www.ciil-lisindia.net/oriya/oriya.html\nhttps://web.archive.org/web/20150304085123/http://www.ciil-lisindia.net:80/Oriya/Oriya.html\n--->\n\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by a combining\nmark-like form. \n\n  - A <samp>\"Ra,Halant\"</samp> sequence at the beginning of a syllable is replaced\n    with an above-base mark called <samp>\"Reph\"</samp> (unless the <samp>\"Ra\"</samp> is the only\n    consonant in the syllable). This rule is synonymous with the\n    `REPH_MODE_IMPLICIT` characteristic mentioned earlier.\n  - A non-initial <samp>\"Halant,Ra\"</samp> sequence is replaced with a\n    below-base mark called <samp>\"Raphala\"</samp>.\n  \n<samp>\"Reph\"</samp> and <samp>\"Raphala\"</samp> characters must be reordered after the\nsyllable-identification stage is complete. \n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Oriya script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Oriya syllables.\n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\"</samp> here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nOriya. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Oriya does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nOriya includes two consonants that can take on post-base forms, `Ya` and `Yya`.\n\n:::{figure-md}\n![Post-base consonant Ya](/images/oriya/oriya-pstf-ya.svg \"Post-base consonant Ya\"){.shaping-demo .inline-svg .greyscale-svg #oriya-pstf-ya}\n\nPost-base consonant Ya\n:::\n\n```{svg-color-toggle-button} oriya-pstf-ya\n```\n\n\n\n:::{figure-md}\n![Post-base consonant Yya](/images/oriya/oriya-pstf-yya.svg \"Post-base consonant Yya\"){.shaping-demo .inline-svg .greyscale-svg #oriya-pstf-yya}\n\nPost-base consonant Yya\n:::\n\n```{svg-color-toggle-button} oriya-pstf-yya\n```\n\n\nOriya includes one consonant that can take on a special below-base form:\n\n  - <samp>\"Halant,Ra\"</samp> (in a syllable-final position) take on the <samp>\"Raphala\"</samp>\n    form. \n\n:::{figure-md}\n![Raphala composition](/images/oriya/oriya-blwf-ra.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blwf-ra}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} oriya-blwf-ra\n```\n\n\n<!---In addition, all consonants in Oriya can take on subjoined forms.--->\n\n> Note: Because Oriya employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Stage 2, step 5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their individual components. Oriya has three\nmulti-part dependent vowels, \"Ai\" (`U+0B48`), \"O\" (`U+0B4B`), and \"Au\" (`U+0B4C`). Each\nhas a canonical decomposition, so this step is unambiguous. \n\n> \"Ai\" (`U+0B48`) decomposes to \"`U+0B47`,`U+0B56`\"\n>\n> \"O\" (`U+0B4B`) decomposes to \"`U+0B47`,`U+0B3E`\"\n>\n> \"Au\" (`U+0B4C`) decomposes to \"`U+0B47`,`U+0B57`\"\n\n> Note: \"Au Length Mark\" (`U+0B57`) is categorized in Unicode as being a\n> top-and-right matra, a combination that would normally decompose\n> into one TOP_POSITION mark and one RIGHT_POSITION mark\n> (`U+0B3E`,`U+0B56`). In \"Au Length Mark\", however, the `U+0B3E`\n> component is intended to be positioned over the `U+0B56` component,\n> not above the base.\n>\n> Consequently, the two decomposed components should both be tagged\n> for the `POS_AFTER_POST` sorting position, and neither will need to\n> be reordered.\n>\n> In addition, the decomposition is not canonical in\n> Unicode, so performing the decomposition may trigger unknown\n> behavior from other components of the software stack. Consequently,\n> shaping engines may choose to skip it. \n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/oriya/oriya-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} oriya-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAll above-base dependent-vowel (matra) signs are tagged `POS_AFTER_MAIN`.\n\nAll right-side dependent-vowel (matra) signs are tagged\n`POS_AFTER_POST`.\n\nAll below-base dependent-vowel (matra) signs are tagged\n`POS_AFTER_SUBJOINED`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nOriya includes one consonant that can take on a special below-base form:\n\n  - <samp>\"Halant,Ra\"</samp> (in a non-initial position) takes on the <samp>\"Raphala\"</samp>\n    form. \n\n:::{figure-md}\n![Raphala composition](/images/oriya/oriya-blwf-ra-1.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blwf-ra-1}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} oriya-blwf-ra-1\n```\n\n\n> Note: Because Oriya employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in stage 2, step 1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant\"</samp> sequence will always become a <samp>\"Reph\"</samp>\n> unless the <samp>\"Ra\"</samp> is the only consonant in the syllable.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<ory2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nOriya includes one exception to the above general rule. The\n<samp>\"Candrabindu\"</samp> (`U+0B01`) must be tagged with `POS_BEFORE_SUBJOINED`.\n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp> and <samp>\"Ja,Halant,Nya\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant\"</samp> sequences but _not_ match\n            initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `blwf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base positions and\n            <samp>\"Ra,Halant\"</samp> in non-initial pre-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,Ya\"</samp> and <samp>\"Halant,Yya\"</samp> in post-base position\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Oriya)\n\tpref (not used in Oriya)\n\tblwf \n\tabvf\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\tcfar (not used in Oriya)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta composition](/images/oriya/oriya-nukt.svg \"Nukta composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-nukt}\n\nNukta composition\n:::\n\n```{svg-color-toggle-button} oriya-nukt\n```\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces two specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  - <samp>\"Ja,Halant,Nya\"</samp> is substituted with the <samp>\"JNya\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> and\n<samp>\"JNya\"</samp> characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![KSsa ligation](/images/oriya/oriya-akhn-kssa.svg \"KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #oriya-akhn-kssa}\n\nKSsa ligation\n:::\n\n```{svg-color-toggle-button} oriya-akhn-kssa\n```\n\n\n:::{figure-md}\n![JNya ligation](/images/oriya/oriya-akhn-jnya.svg \"JNya ligation\"){.shaping-demo .inline-svg .greyscale-svg #oriya-akhn-jnya}\n\nJNya ligation\n:::\n\n```{svg-color-toggle-button} oriya-akhn-jnya\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence, however, must not be flagged for\n    the `rphf` substitution.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\n:::{figure-md}\n![Reph composition](/images/oriya/oriya-rphf.svg \"Reph composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-rphf}\n\nReph composition\n:::\n\n```{svg-color-toggle-button} oriya-rphf\n```\n\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Oriya.\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Oriya.\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. Oriya includes one special below-base consonant\nform:\n\n  - <samp>\"Halant,Ra\"</samp> (in a non-initial position) takes on the <samp>\"Raphala\"</samp>\n    form. \n\n:::{figure-md}\n![Raphala composition](/images/oriya/oriya-blwf-ra-2.svg \"Raphala composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blwf-ra-2}\n\nRaphala composition\n:::\n\n```{svg-color-toggle-button} oriya-blwf-ra-2\n```\n\n\n<!---In addition, all consonants in Oriya can take on subjoined forms.--->\n\nBecause Oriya incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n  - The context defined for a `blwf` feature is:\n\n:::{table} `blwf` feature context\n    \n| Backtrack     | Matching sequence        | Lookahead     |\n|:--------------|:-------------------------|:--------------|\n| `_consonant_` | `_halant_`,`_consonant_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Below-base consonant composition](/images/oriya/oriya-blwf.svg \"Below-base consonant composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blwf}\n\nBelow-base consonant composition\n:::\n\n```{svg-color-toggle-button} oriya-blwf\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Oriya.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n_Consonant_.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are two exceptions to the default behavior, for which the\nshaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any special forms.\n\n\n:::{figure-md}\n![Post-base form Ya composition](/images/oriya/oriya-pstf-ya-1.svg \"Post-base form Ya composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-pstf-ya-1}\n\nPost-base form Ya composition\n:::\n\n```{svg-color-toggle-button} oriya-pstf-ya-1\n```\n\n\n:::{figure-md}\n![Post-base form Yya composition](/images/oriya/oriya-pstf-yya-1.svg \"Post-base form Yya composition\"){.shaping-demo .inline-svg .greyscale-svg #oriya-pstf-yya-1}\n\nPost-base form Yya composition\n:::\n\n```{svg-color-toggle-button} oriya-pstf-yya-1\n```\n\n\n#### Stage 3, step 11: vatu ####\n\n> This feature is not used in Oriya.\n\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<ory2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n\n:::{figure-md}\n![Conjunct ligation](/images/oriya/oriya-cjct.svg \"Conjunct ligation\"){.shaping-demo .inline-svg .greyscale-svg #oriya-cjct}\n\nConjunct ligation\n:::\n\n```{svg-color-toggle-button} oriya-cjct\n```\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Oriya.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra position](/images/oriya/oriya-matra-position.svg \"Pre-base matra position\"){.shaping-demo .inline-svg .greyscale-svg #oriya-matra-position}\n\nPre-base matra position\n:::\n\n```{svg-color-toggle-button} oriya-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Oriya incorporates the `REPH_POS_AFTER_MAIN`\nshaping characteristic, this final position is immediately after the\nsyllable base.\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Oriya incorporates the\n    > `REPH_POS_AFTER_MAIN` shaping characteristic, this means\n    > any positioning tag of `POS_ABOVEBASE_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_ABOVEBASE_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\nFinally, if the final position of <samp>\"Reph\"</samp> or <samp>\"Repha\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp>/<samp>\"Repha\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n:::{figure-md}\n![Reph position](/images/oriya/oriya-reph-position.svg \"Reph position\"){.shaping-demo .inline-svg .greyscale-svg #oriya-reph-position}\n\nReph position\n:::\n\n```{svg-color-toggle-button} oriya-reph-position\n```\n\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nOriya does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<ory2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nOriya does not use the `init` feature, so this step will\ninvolve no work when processing `<ory2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Oriya)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Oriya.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![Pre-base form substitution](/images/oriya/oriya-pres.svg \"Pre-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #oriya-pres}\n\nPre-base form substitution\n:::\n\n```{svg-color-toggle-button} oriya-pres\n```\n\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base form substitution](/images/oriya/oriya-abvs.svg \"Above-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #oriya-abvs}\n\nAbove-base form substitution\n:::\n\n```{svg-color-toggle-button} oriya-abvs\n```\n\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or\nsyllable bases that\nare adjacent to below-base-consonant forms like <samp>\"Raphala\"</samp> with\ncontextual ligatures.\n\n:::{figure-md}\n![Below-base form substitution](/images/oriya/oriya-blws.svg \"Below-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blws}\n\nBelow-base form substitution\n:::\n\n```{svg-color-toggle-button} oriya-blws\n```\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base form substitution](/images/oriya/oriya-psts.svg \"Post-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #oriya-psts}\n\nPost-base form substitution\n:::\n\n```{svg-color-toggle-button} oriya-psts\n```\n\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![Halant form substitution](/images/oriya/oriya-haln.svg \"Halant form substitution\"){.shaping-demo .inline-svg .greyscale-svg #oriya-haln}\n\nHalant form substitution\n:::\n\n```{svg-color-toggle-button} oriya-haln\n```\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Oriya shaping\n> and may be disabled in the application by user preference.\n\n\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Oriya text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Distance positioning](/images/oriya/oriya-dist.svg \"Distance positioning\"){.shaping-demo .inline-svg .greyscale-svg #oriya-dist}\n\nDistance positioning\n:::\n\n```{svg-color-toggle-button} oriya-dist\n```\n\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Oriya, this includes <samp>\"Reph\"</samp> in addition to the\nabove-base dependent vowels (matras), diacritical marks and Vedic signs. \n\n:::{figure-md}\n![Above-base mark position](/images/oriya/oriya-abvm.svg \"Above-base mark position\"){.shaping-demo .inline-svg .greyscale-svg #oriya-abvm}\n\nAbove-base mark position\n:::\n\n```{svg-color-toggle-button} oriya-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Oriya, this includes below-base dependent vowels\n(matras) as well as the below-base consonant form <samp>\"Raphala\"</samp>.\n\n:::{figure-md}\n![Below-base mark position](/images/oriya/oriya-blwm.svg \"Below-base mark position\"){.shaping-demo .inline-svg .greyscale-svg #oriya-blwm}\n\nBelow-base mark position\n:::\n\n```{svg-color-toggle-button} oriya-blwm\n```\n\n\n## The `<orya>` shaping model ##\n\nThe older Oriya script tag, `<orya>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<orya>` and some users may still have documents that were written to\ntake advantage of `<orya>` shaping.\n\n### Distinctions from `<ory2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<orya>` to\n`<ory2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<orya>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<ory2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<orya>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<orya>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\n\n### Advice for handling fonts with `<orya>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<orya>` shaping model.\n\n### Advice for handling text runs composed in `<orya>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<orya>` script tag and it is known that the font in use supports\nonly the `<ory2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<orya>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-sinhala.md",
    "content": "```{include} /_global.md\n```\n\n# Sinhala shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Sinhala script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Sinhala character tables](#sinhala-character-tables)\n  - [The `<sinh>` shaping model](#the-sinh-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n\n## General information ##\n\nThe Sinhala script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the South Indic subgroup.\n\nThe Sinhala script is used to write multiple languages, most commonly\nSinhalese and Pali. In addition, Sanskrit may be written\nin Sinhala, so Sinhala script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nUnlike many other Indic scripts, there is only one extant Sinhala\nscript tag defined in OpenType, `<sinh>`.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the\nSinhalese language, this sign is known as the _al-lakuna_ or _hal kirīma_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. \n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample. In the Sinhalese language, the Reph form is known as _repaya_.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Sinhala text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nSinhala glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Sinhala character tables ###\n\nSeparate character tables are provided for the Sinhala, Sinhala\nArchaic Numbers, and Vedic Extensions block as well as for other\nmiscellaneous characters that are used in `<sinh>` text runs:\n\n  - [Sinhala character table](character-tables/character-tables-sinhala.md#sinhala-character-table)\n  - [Sinhala Archaic Numbers character table](character-tables/character-tables-sinhala.md#sinhala-archaic-numbers-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-sinhala.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-sinhala.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0D82`   | Mark [Mn]        | BINDU             | RIGHT_POSITION             | &#x0D82; Anusvara            |\n| | | | |\n|`U+0D9A`   | Letter           | CONSONANT         | _null_                     | &#x0D9A; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are designated as _unassigned_ in\nthe _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Sinhala text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nIn other Indic scripts, the zero-width joiner (<abbr>ZWJ</abbr>) is used to prevent\nthe formation of conjuncts and to suppress the formation of <samp>\"Reph\"</samp>.\n\nSinhala, however, differs considerably in its use of <samp>\"ZWJ\"</samp>.\n\n  - In `<sinh>` text, <samp>\"Reph\"</samp> is only formed by the use of an explicit\n    <samp>\"Ra,Halant,ZWJ\"</samp> sequence.\n  - In `<sinh>` text, the sequence\n    <samp>\"Consonant_1,Halant,ZWJ,Consonant_2\"</samp> is used to specify the\n    subjoined form of <samp>\"Consonant_2\"</samp>.\n \n:::{figure-md}\n![Reph formation](/images/sinhala/sinhala-rphf.svg \"Reph formation\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-rphf}\n\nReph formation\n:::\n\n```{svg-color-toggle-button} sinhala-rphf\n```\n\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is not used in shaping runs of\nSinhala text. The <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> is referenced below in various regular\nexpressions and shaping rules, however, because it is used by other\nIndic scripts.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\n\n\n\n## The `<sinh>` shaping model ##\n\nProcessing a run of `<sinh>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Sinhala's specific shaping\ncharacteristics include: \n\n  - `BASE_POS_LAST_SINHALA` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant\n     forms. However, the algorithm used for locating the base\n     consonant in `<sinh>` text differs from that used by other\n     `BASE_POS_LAST` scripts.\n\n  - `REPH_POS_AFTER_POST` = <samp>\"Reph\"</samp> is ordered after the last post-base\n     consonant form.\n\n  - `REPH_MODE_EXPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_SUBJOINED` = Right-side matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are\n     ordered after all subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Sinhala consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Sinhala Unicode block enumerates two modifier signs,\n> \"Anusvara\" (`U+0D82`) and \"Visarga\" (`U+0D83`). In addition,\n> Sanskrit text written in Sinhala may include additional signs from\n> Vedic Extensions block. \n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable that does not take on a subjoined form, and its vowel sound\ndesignates the end of the syllable. This rule is synonymous with the\n`BASE_POS_LAST_SINHALA` characteristic mentioned earlier. \n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by a combining\nmark-like form. \n\n  - A <samp>\"Ra,Halant,ZWJ\"</samp> sequence at the beginning of a syllable\n    is replaced with an above-base mark called <samp>\"Reph\"</samp>. \n    This rule is synonymous with the `REPH_MODE_EXPLICIT`\n    characteristic mentioned earlier.\n\nIn addition, the subjoined form of a post-base-consonant <samp>\"Ra\"</samp> can be\nexplicitly requested with a <samp>\"Halant,ZWJ,Ra\"</samp> sequence. This form is called\n<samp>\"Rakaaraansaya\"</samp>.\n\n<samp>\"Reph\"</samp> characters must be reordered after the syllable-identification\nstage is complete. <samp>\"Rakaaraansaya\"</samp> is not reordered.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Sinhala script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Sinhala syllables. \n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER` \n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER`| `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaaraansaya\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant,ZWJ\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. <samp>\"Ya\"</samp> may take on special forms,\n> depending on its position in the syllable. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nSinhala. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\nDue to the different usage of <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> characters in `<sinh>` text runs, a\ndifferent algorithm is required for the shaper to identify the base\nconsonant of a syllable. The algorithm for determining the base\nconsonant in Sinhala is\n\n  - If the syllable starts with <samp>\"Ra,Halant,ZWJ\"</samp>, exclude the starting\n    <samp>\"Ra\"</samp> from the list of consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is immediately preceded by a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>, move to the\n        previous consonant. If the consonant is not immediately\n        preceded by a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>, stop.\n      * If the consonant is the first consonant, stop.\n  - The consonant stopped at will be the base consonant.\n\n\n> Note: Unlike with many other Indic scripts, it is not necessary for\n> the shaping engine to independently determine if any consonant has a\n> post-base or below-base form in the active font. The use of a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>\n> character before a consonant in the search explicitly designates\n> such a special form.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their individual components. \n\nSinhala has four multi-part dependent vowels, \"Ee\" (`U+0DDA`), \"O\"\n(`U+0DDC`), \"Oo\" (`U+0DDD`), and \"Au\" (`U+0DDE`). Each\nhas a canonical decomposition, so this step is unambiguous. \n\n> \"Ee\" (`U+0DDA`) decomposes to \"`U+0DD9`,`U+0DCA`\"\n>\n> \"O\" (`U+0DDC`)  decomposes to \"`U+0DD9`,`U+0DCF`\"\n>\n> \"Oo\" (`U+0DDD`) decomposes to \"`U+0DD9`,`U+0DCF`, `U+0DCA`\"\n>\n> \"Au\" (`U+0DDE`) decomposes to \"`U+0DD9`,`U+0DDF`\"\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n> Note: The decomposition of \"Oo\" (`U+0DDD`) is atypical; Unicode\n> specifies that the codepoint decomposes to \"O\" (`U+0DDC`) followed\n> by `U+0DCA`; the \"O\" codepoint is then decomposed to\n> \"`U+0DD9`,`U+0DCF`\". Shaping engines must take care not to miss this\n> second decomposition.\n\n> Note: For Sinhala, the `pstf` substitution feature of <abbr title=\"Glyph Substitution table\">GSUB</abbr> is\n> defined as replacing the entire multi-part matra with its right-side\n> component. \n>\n> The Microsoft Uniscribe shaping engine historically\n> supported this behavior -- in a sense, decomposing each matra into\n> its left-side component followed by a duplicate of the original\n> matra, then substituting the duplicated matra with the right-side\n> matra component in [stage 3, step 10](#stage-3-step-10-pstf), when the `pstf`\n> feature is applied. \n>\n> Fonts that were engineered to support this behavior might not\n> include <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning rules for the right-side matra components,\n> relying instead on the `pstf` substitution to provide a suitable\n> replacement. Shaping engines should do their best to deal gracefully\n> with fonts that were developed only with this behavior in mind.\n\n:::{figure-md}\n![Multi-part matra decomposition](/images/sinhala/sinhala-matra-decompose.svg \"Multi-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-matra-decompose}\n\nMulti-part matra decomposition\n:::\n\n```{svg-color-toggle-button} sinhala-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAbove-base, right-side, and below-base dependent-vowel (matra) signs\nmust be tagged with `POS_AFTER_SUBJOINED`.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n> Note: Nukta usage in Sinhala is rare.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the base consonant or syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`.\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence will always become a <samp>\"Reph\"</samp>.\n\n#### Stage 2, step 7: Post-base consonants ####\n\nSeventh, any non-base consonants that occur after a dependent vowel\n(matra) sign must be tagged with `POS_POSTBASE_CONSONANT`. \n\nIn Sinhala, the only consonants that can appear in this position are\n<samp>\"Ra\"</samp> and <samp>\"Ya\"</samp>. A <samp>\"Halant,ZWJ,Ya\"</samp> sequence after the base consonant or syllable base will take on\nthe <samp>\"Yansaya\"</samp> form when the `vatu` feature is applied. A\n<samp>\"Halant,ZWJ,Ra\"</samp> sequence after the base consonant or syllable base will take on \nthe <samp>\"Rakaaraansaya\"</samp> form when the `vatu` feature is applied.\n\n:::{figure-md}\n![Yansaya ligation](/images/sinhala/sinhala-vatu-va.svg \"Yansaya ligation\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-vatu-va}\n\nYansaya ligation\n:::\n\n```{svg-color-toggle-button} sinhala-vatu-va\n```\n\n:::{figure-md}\n![Rakaaraansaya ligation](/images/sinhala/sinhala-vatu-ra.svg \"Rakaaraansaya ligation\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-vatu-ra}\n\nRakaaraansaya ligation\n:::\n\n```{svg-color-toggle-button} sinhala-vatu-ra\n```\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `akhn` should match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> and\n           <samp>\"_Consonant_,ZWJ,Halant,_Consonant_\"</samp> sequences\n  - `rphf` should match initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `pstf` should match <samp>\"_Matra_\"</samp> in post-base position\n  - `vatu` should match <samp>\"Halant,ZWJ,Ra\"</samp> and <samp>\"Halant,ZWJ,Va\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt (not used in Sinhala)\n\takhn\n\trphf \n\trkrf (not used in Sinhala)\n\tpref (not used in Sinhala)\n\tblwf (not used in Sinhala)\n\tabvf (not used in Sinhala)\n\thalf (not used in Sinhala)\n\tpstf\n\tvatu\n\tcjct (not used in Sinhala)\n\tcfar (not used in Sinhala)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 3: akhn ####\n\nIn Sinhala, the `akhn` feature provides two substitution types.\n\n  - <samp>\"Consonant,Halant,ZWJ,Consonant\"</samp> sequences are used to specify a ligature. \n  - <samp>\"Consonant,ZWJ,Halant,Consonant\"</samp> sequences are used to specify\n    \"touching consonant\" substitutions used in Pali and Sanskrit. \n  \n\n:::{figure-md}\n![Ligature substitution](/images/sinhala/sinhala-akhn-ligature.svg \"Ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-akhn-ligature}\n\nLigature substitution\n:::\n\n```{svg-color-toggle-button} sinhala-akhn-ligature\n```\n\n:::{figure-md}\n![Touching consonant substitution](/images/sinhala/sinhala-akhn-touching.svg \"Touching consonant substitution\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-akhn-touching}\n\nTouching consonant substitution\n:::\n\n```{svg-color-toggle-button} sinhala-akhn-touching\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\t\n\n:::{figure-md}\n![Reph composition](/images/sinhala/sinhala-rphf-1.svg \"Reph composition\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-rphf-1}\n\nReph composition\n:::\n\n```{svg-color-toggle-button} sinhala-rphf-1\n```\n\t\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 6: pref ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 7: blwf ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 9: half ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 10: pstf ####\n\nIn Sinhala, the `pstf` feature replaces multi-part dependent vowels\n(matras) with the right-side matra component of the canonical\ndecomposition.\n\n> Note: This substitution is possible because all multi-part dependent\n> vowels in Sinhala use the same left-side matra component, `U+0DD9`.\n>\n> The Microsoft Uniscribe shaping engine historically\n> supported this behavior by handling the decomposition of multi-part\n> dependent vowels in [stage 2, step 2](#stage-2-step-2-matra-decomposition)\n> differently for Sinhala -- in a sense, decomposing each matra into\n> its left-side component followed by a duplicate of the original\n> matra, then substituting the duplicated matra with the right-side\n> matra component when the `pstf` feature is applied. \n> \n> Shaping engines may, optionally, decompose multi-part dependent\n> vowels in [stage 2, step 2](#stage-2-step-2-matra-decomposition) into their\n> canonical Unicode decompositions, as is done in other scripts, and\n> substitute the decomposed right-side matra components at that point.\n> \n> Doing so will negate the need to apply the `pstf` substitution.\n> However, fonts that were engineered to support the\n> Uniscribe-supported behavior might not include <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning\n> rules for the right-side matra components, relying instead on the\n> `pstf` substitution to provide a suitable replacement. Shaping\n> engines should do their best to deal gracefully with fonts that were\n> developed only with this behavior in mind.\n\n:::{figure-md}\n![Post-base form substitution](/images/sinhala/sinhala-pstf.svg \"Post-base form substitution\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-pstf}\n\nPost-base form substitution\n:::\n\n```{svg-color-toggle-button} sinhala-pstf\n```\n\n\n#### Stage 3, step 11: vatu ####\n\nIn Sinhala, the `vatu` feature replaces certain sequences with\nligatures using the subjoined forms of <samp>\"Ra\"</samp> or <samp>\"Ya\"</samp>.\n\n  - The sequence <samp>\"Consonant,Halant,ZWJ,Ra\"</samp> triggers the\n    <samp>\"Rakaaraansaya\"</samp> form of the consonant.\n  - The sequence <samp>\"Consonant,Halant,ZWJ,Ya\"</samp> triggers the <samp>\"Yansaya\"</samp> form\n    of the consonant.\n  \n\n:::{figure-md}\n![Rakaaraansaya ligation](/images/sinhala/sinhala-vatu-ra-1.svg \"Rakaaraansaya ligation\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-vatu-ra-1}\n\nRakaaraansaya ligation\n:::\n\n```{svg-color-toggle-button} sinhala-vatu-ra-1\n```\n\n:::{figure-md}\n![Yansaya ligation](/images/sinhala/sinhala-vatu-va-1.svg \"Yansaya ligation\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-vatu-va-1}\n\nYansaya ligation\n:::\n\n```{svg-color-toggle-button} sinhala-vatu-va-1\n```\n\n\n#### Stage 3, step 12: cjct ####\n\n> This feature is not used in Sinhala.\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Sinhala.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n:::{figure-md}\n![Pre-base matra positioning](/images/sinhala/sinhala-matra-position.svg \"Pre-base matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-matra-position}\n\nPre-base matra positioning\n:::\n\n```{svg-color-toggle-button} sinhala-matra-position\n```\n\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Sinhala incorporates the `REPH_POS_AFTER_POST`\nshaping characteristic, this final position is defined to be\nimmediately after any post-base consonant forms.\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Sinhala incorporates the\n    > `REPH_POS_AFTER_POST` shaping characteristic, this means\n    > any positioning tag of `POS_FINAL_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_FINAL_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\nFinally, if the final position of <samp>\"Reph\"</samp> or <samp>\"Repha\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp>/<samp>\"Repha\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n\n:::{figure-md}\n![Reph positioning](/images/sinhala/sinhala-reph-position.svg \"Reph positioning\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-reph-position}\n\nReph positioning\n:::\n\n```{svg-color-toggle-button} sinhala-reph-position\n```\n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n\nSinhala does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<sinh>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n  \n  \n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nSinhala does not use the `init` feature, so this step will\ninvolve no work when processing `<sinh>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n   \n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Sinhala)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln (not used in Sinhala)\n\nThe `init` feature is not used in Sinhala.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include ligatures, \"touching consonant\" forms,\nand stylistic variants of left-side dependent vowels (matras). \n\n:::{figure-md}\n![Pre-base substitutions](/images/sinhala/sinhala-pres.svg \"Pre-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-pres}\n\nPre-base substitutions\n:::\n\n```{svg-color-toggle-button} sinhala-pres\n```\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base substitutions](/images/sinhala/sinhala-abvs.svg \"Above-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-abvs}\n\nAbove-base substitutions\n:::\n\n```{svg-color-toggle-button} sinhala-abvs\n```\n\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or\nsyllable bases\nand attached below-base marks with contextual ligatures.\n\n:::{figure-md}\n![Below-base substitutions](/images/sinhala/sinhala-blws.svg \"Below-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-blws}\n\nBelow-base substitutions\n:::\n\n```{svg-color-toggle-button} sinhala-blws\n```\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\nbase-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base substitutions](/images/sinhala/sinhala-psts.svg \"Post-base substitutions\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-psts}\n\nPost-base substitutions\n:::\n\n```{svg-color-toggle-button} sinhala-psts\n```\n\n\nThe `haln` feature is not used in Sinhala.\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Sinhala shaping\n> and may be disabled in the application by user preference.\n\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Sinhala text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Distance positioning](/images/sinhala/sinhala-dist.svg \"Distance positioning\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-dist}\n\nDistance positioning\n:::\n\n```{svg-color-toggle-button} sinhala-dist\n```\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Sinhala, this includes <samp>\"Reph\"</samp> in addition to\nabove-base dependent vowels (matras), diacritical marks, and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/sinhala/sinhala-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} sinhala-abvm\n```\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Sinhala, this includes below-base dependent vowels\n(matras) and diacritical marks.\n\n:::{figure-md}\n![Below-base mark positioning](/images/sinhala/sinhala-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #sinhala-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} sinhala-blwm\n```\n"
  },
  {
    "path": "opentype-shaping-syriac.md",
    "content": "```{include} /_global.md\n```\n\n# Syriac script shaping in OpenType #\n\nThis document details the general shaping procedure shared by all\nSyriac script styles, and defines the common pieces that style-specific\nimplementations share. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Joining properties](#joining-properties)\n\t  - [Mark classification](#mark-classification)\n\t  - [Character tables](#character-tables)\n  - [The `<syrc>` shaping model](#the-syrc-shaping-model)\n      - [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)\n      - [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)\n      - [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)\n      - [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)\n      - [Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr>](#stage-5-applying-the-language-form-substitution-features-from-gsub)\n      - [Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr>](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)\n      - [Stage 7: Applying the positioning features from <abbr>GPOS</abbr>](#stage-7-applying-the-positioning-features-from-gpos)\n  \n\n\n## General information ##\n\nThe Syriac script is used to write multiple languages, most commonly\nClassical Syriac and multiple dialects of Aramaic. In addition,\nhistorical texts use Syriac to write Arabic, Malayalam, Turkish,\nKurdish, and Armenian.\n\nThe Syriac script encompasses multiple distinct styles, including\nʾEsṭrangēlā (classical), Maḏnḥāyā (Eastern), and Serṭā (Western), that\nshare a number of common features and rules, but that differ\nconsiderably in their final appearance. Due to the common features\nfound between the styles, a shaping engine can support all styles of\nSyriac with a single shaping model.\n\nIn OpenType, Syriac shaping shares most of the same features that are\ndefined for [Arabic](opentype-shaping-arabic.md) and related scripts, but with a few\nSyriac-specific additions. Therefore, shaping engines are advised to\nsupport Syriac and Arabic using the [same shaping model](opentype-shaping-arabic-general.md).\n\nSyriac is a joining script that uses inter-word spaces, so each\ncodepoint in a text run may be substituted with one of several\ncontextual forms corresponding to what, if any, characters appear\nbefore and after the codepoint. Most, but not all, letter sequences\njoin; shaping engines must track which positions trigger joining\nbehavior for each letter. \n\n:::{figure-md}\n![Isolated, initial, medial, and final contextual forms of a letter](/images/syriac/syriac-joining.svg \"Isolated, initial, medial, and final contextual forms of a letter\"){.shaping-demo .inline-svg .greyscale-svg #syriac-joining}\n\nIsolated, initial, medial, and final contextual forms of a letter\n:::\n\n```{svg-color-toggle-button} syriac-joining\n```\n\nSyriac is written (and, therefore, rendered) from right to\nleft. Shaping engines must track the directionality of the text run\nwhen scripts of different direction are mixed.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for elements of the\nSyriac script. The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Base** glyph or character is the standard term for a Syriac\ncharacter that is capable of taking a diacritical mark. \n\nAll of the base characters in Syriac are consonants by definition, but\nseveral of these consonants are also used to represent vowels as base\ncharacters in certain circumstances.\n\nVowels that are not base characters are frequently omitted from the\ntext run entirely. Alternatively, such a vowel may appear as a\ndiacritical mark in the Maḏnḥāyā and Serṭā script styles. The standard\nterm for these marks is vowel **points**.\n\n**Kashida** (or **tatweel**) is the term for a glyph inserted into a\nsequence for the purpose of elongating the baseline stroke of a\nletter. Unicode documents use the term \"tatweel\" most frequently,\nwhile OpenType documents use the term \"kashida\" most\nfrequently. Kashidas are typically inserted in order to justify lines\nof text. \n\n**Majlīyānā** is the name for the diacritical mark that is attached to\na native Syriac letter in order to change it to a foreign loan letter.\n\n**Syāmē** is the name for the diacritical mark that is used to\nindicate the pluralization of a word.\n\nThe **Syriac Abbreviation Mark** is a Unicode control character used\nto trigger the addition of an overline glyph that may span the length\nof multiple letters. The Syriac Abbreviation Mark is often used to\ndenote the elision of letters from a word; it can also be used to\ndenote that a sequence of letters represents a number rather than a\nword.\n\n\n## Glyph classification ##\n\nBecause Syriac is a joining (or cursive) script, proper shaping of\ntext runs involves identifying the joining behavior of each character,\nthen combining that information with any preceding or subsequent\ncharacters to determine the contextually correct form for display.\n\n### Joining properties ###\n\nSyriac characters are assigned a `JOINING_TYPE` property in the\nUnicode standard that indicates how they join to adjacent\ncharacters. There are six possible values: \n\n  - `JOINING_TYPE_LEFT` indicates that a character joins with\n    the subsequent character, but does not join with the preceding\n    character. \n\t\n  - `JOINING_TYPE_RIGHT` indicates that a character joins with the\n    preceding character, but does not join with the subsequent character.\t\n\n  - `JOINING_TYPE_DUAL` indicates that a character joins with the\n    preceding character and joins with the subsequent character.\n\t\n  - `JOINING_TYPE_NON_JOINING` indicates that a character does not\n    join with the preceding or with the subsequent character.\n\t\n  - `JOINING_TYPE_TRANSPARENT` indicates that the character does not\n    join with adjacent characters _and_ that the character must be\n    skipped over when the shaping engine is evaluating the joining\n    positions in a sequence of characters. When a\n    `JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,\n    the `JOINING_TYPE` of the preceding character passes\n    through. Diacritical marks are frequently assigned this value. \n\t\n  - `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces\n    the use of joining forms with the preceding and subsequent\n    characters. Kashidas and the Zero Width Joiner (`U+200D`) are both\n    `JOIN_CAUSING` characters.\n  \n\nSyriac letters are also assigned to a `JOINING_GROUP` that indicates\nwhich fundamental character they behave like with regard to joining\nbehavior. Each of the basic letters in the Syriac block tends to\nbelong to its own `JOINING_GROUP`, while extended letters are often\nassigned to the `JOINING_GROUP` that corresponds to the character's\nbase letter. \n\nFor example, the letter \"Persian Bheth\" is rendered as the base Syriac\n\"Beth\" with an additional stroke at the top. Therefore, it is assigned\nto the `BETH` joining group.\n\nIn addition to the standard joining types, `<syrc>` text features two\n`JOINING_GROUP`s that trigger special behavior: `ALAPH` and\n`DALATH_RISH`.\n\nThe `fin2`, `fin3`, and `med2` <abbr title=\"Glyph Substitution table\">GSUB</abbr> features implement Syriac-specific\nshaping rules that affect glyphs in the `ALAPH` joining group, based\non the preceding glyph.\n\n  - `fin2` and `fin3` substitute special terminal forms of `ALAPH`\n    glyphs, depending on whether or not the preceding character\n    belongs to the `DALATH_RISH` joining group.\n  - `med2` substitutes special medial forms of `ALAPH` glyphs,\n    depending on whether or not the preceding character is\n    left-joining (that is, belonging to the `DUAL`, `LEFT`, or\n    `JOIN_CAUSING` `JOINING_GROUP`s.)\n\nThe `DALATH_RISH` joining group includes the standard letters \"Dalath\"\nand \"Rish\" as well as the \"Dotless Dalath-Rish\", an ambiguous letter\nthat is used in Old Syriac text, when neither the \"Dalath\" and \"Rish\"\nletters featured a dot, and may also be used in transcribing\nhistorical documents where it is impossible to distinguish whether the\nletter in the source text is \"Dalath\" or \"Rish\".\n\n:::{figure-md}\n![Dalath, Rish, Dotless Dalath-Rish](/images/syriac/syriac-dalath-rish.svg \"Dalath, Rish, Dotless Dalath-Rish\"){.shaping-demo .inline-svg .greyscale-svg #syriac-dalath-rish}\n\nDalath, Rish, Dotless Dalath-Rish\n:::\n\n```{svg-color-toggle-button} syriac-dalath-rish\n```\n\n\nShaping engines may choose to define pseudo-`JOINING_TYPE`s\ncorresponding to the `ALAPH` and `DALATH_RISH` joining groups, or may\ntrack the appropriate `JOINING_GROUP` properties by any other means\npreferred.\n\n\n\n### Mark classification ###\n\nThe Unicode standard defines a _canonical combining class_ for each\ncodepoint that is used whenever a sequence needs to be sorted into\ncanonical order. \n\nSeveral of the Syriac marks belong to standard combining\nclasses:\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|`U+0711`   | 36              | &#x0711; Superscript Alaph         |\n|           | 220             | Other below-base combining marks   |\n|           | 230             | Other above-base combining marks   |\n:::\n\n\nThe numeric values of these combining classes are used during Unicode\nnormalization.\n\n\nThese classifications are used in the [mark-transient-reordering\nstage](#stage-1-transient-reordering-of-modifier-combining-marks).\n\n\t\t\t\n### Character tables ###\n\nSeparate character tables are provided for the Syriac and Syriac\nSupplement Unicode blocks, as well as for other miscellaneous\ncharacters that are used in `<syrc>` text runs:\n\n  - [Syriac character table](character-tables/character-tables-syriac.md#syriac-character-table)\n  - [Syriac Supplement character table](character-tables/character-tables-syriac.md#syriac-supplement-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-syriac.md#miscellaneous-character-table)\n\n\nThe tables list each codepoint along with its Unicode general\ncategory and its joining type. For letters, the table lists the\ncodepoint's joining group. For diacritical marks, the table lists the\ncodepoint's mark combining class. The codepoint's Unicode name and an example\nglyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph                        |\n|:----------|:-----------------|:-------------|:--------------|:-----------|:-----------------------------|\n|`U+0712`   | Letter           | DUAL         | BETH          | _null_     | &#x0712; Beth                |\n| | | | | |\n|`U+0737`   | Mark [Mn]        | TRANSPARENT  | _null_        | 220        | &#x0737; Rbasa Below         |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Syriac text include the dotted-circle placeholder (`U+25CC`), the\ncombining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)\nand zero-width non-joiner (`U+200C`), the left-to-right text marker\n(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break\nspace (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ncombining mark in isolation. Real-world text syllables may also use\nother characters, such as hyphens or dashes, in a similar placeholder\nfashion; shaping engines should cope with this situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (which is not common in Syriac but can occur in\n    other scripts)\n\n\nIn addition, Syriac text runs may include the \"tatweel\" or kashida\n(`U+0640`) and \"shadda\" (`U+0651`) codepoints from the Arabic block,\nbecause the Syriac block does not encode a separate kashida or shadda\ncharacter. \n\nModern texts may also make use of Arabic punctuation marks, and texts\nusing Syriac to write Arabic (called \"Garshuni\") may also employ\nArabic ḥarakah (vowel) marks.\n\nThe combining grapheme joiner (<abbr>CGJ</abbr>) is primarily used to alter the\norder in which adjacent marks are positioned during the\nmark-reordering stage, in order to adhere to the needs of a\nnon-default language orthography.\n\nBy default, OpenType shaping reorders sequences of adjacent marks by\nsorting the sequence on the marks' Canonical_Combining_Class (<abbr>Ccc</abbr>)\nvalues. The presence of a <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr> character within a sequence of marks has\nthe effect of splitting the sequence into two sequences of marks and,\ntherefore, halting any mark-reordering that would have occurred\nbetween the marks on either side of the <abbr title=\"Combining Grapheme Joiner\">CGJ</abbr>.\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to force the usage of the\ncursive connecting form of a letter even when the context of the\nadjoining letters would not trigger the connecting form. \n\nFor example, to show the initial form of a letter in isolation (such\nas for displaying it in a table of forms), the sequence <samp>\"_Letter_,ZWJ\"</samp>\nwould be used. To show the medial form of a letter in isolation, the\nsequence <samp>\"ZWJ,_Letter_,ZWJ\"</samp> would be used.\n\nThe zero-width non-joiner (<abbr>ZWNJ</abbr>) is primarily used to prevent a\ncursive connection between two adjacent characters that would, under\nnormal circumstances, form a join. \n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\n\nThe right-to-left mark (<abbr>RLM</abbr>) and left-to-right mark (<abbr>LRM</abbr>) are used by\nthe Unicode bidirectionality algorithm (BiDi) to indicate the points\nin a text run at which the writing direction changes. Generally\nspeaking <abbr title=\"Right-to-Left Mark\">RLM</abbr> and <abbr title=\"Left-to-Right Mark\">LRM</abbr> codepoints do not interact with shaping.\n\nThe no-break space is primarily used to display those codepoints that\nare defined as non-spacing (such as vowel points or diacritical marks) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder.\n\n\n\n## The `<syrc>` shaping model ##\n\nProcessing a run of `<syrc>` text involves seven top-level stages:\n\n1. Transient reordering of modifier combining marks\n2. Compound character composition and decomposition\n3. Computing letter joining states\n4. Applying the `stch` feature\n5. Applying the language-form substitution features from <abbr>GSUB</abbr>\n6. Applying the typographic-form substitution features from <abbr>GSUB</abbr>\n7. Applying the positioning features from <abbr>GPOS</abbr>\n\n\n### Stage 1: Transient reordering of modifier combining marks ###\n\n<!--- http://www.unicode.org/reports/tr53/tr53-1.pdf --->\n\n> Note: The following algorithm contains steps specific to reordering\n> Arabic marks. Since Garshuni text, which uses the Syriac script to\n> write the Arabic language, employs Arabic marks, shaping engines\n> should not omit the mark-reordering logic. \n\nSequences of adjacent marks must be reordered so that they appear in\nthe appropriate visual order before the mark-to-base and mark-to-mark\npositioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr> can be correctly applied.\n\nIn particular, those marks that have strong affinity to the base\ncharacter must be placed closest to the base.\n\nThis mark-reordering operation is distinct from the standard,\ncross-script mark-reordering performed during Unicode\nnormalization. The standard Unicode mark-reordering algorithm is based\non comparing the _Canonical_Combining_Class_ (<abbr>Ccc</abbr>) properties of mark\ncodepoints, whereas this script-specific reordering utilizes the\n_Modifier_Combining_Mark_ (<abbr>MCM</abbr>) subclasses specified in the\ncharacter tables.\n\nThe algorithm for reordering a sequence of marks is:\n\n  - First, move any <samp>\"Shadda\"</samp> (combining class `33`) characters to the\n    beginning of the mark sequence.\n\t\n  -\tSecond, move any subsequence of combining-class-`230` characters that begins\n       with a `230_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters. The subsequence must be moved\n       as a group.\n\n  - Finally, move any subsequence of combining-class-`220` characters that begins\n       with a `220_MCM` character to the beginning of the sequence,\n       before all <samp>\"Shadda\"</samp> characters and before all class-`230`\n       characters. The subsequence must be moved as a group.\n\n> Note: Unicode describes this mark-reordering operation, the Arabic\n> Mark Transient Reordering Algorithm (<abbr>AMTRA</abbr>), in Technical Report 53,\n> which describes it in terms that are distinct from standard,\n> <abbr>Ccc</abbr>-based mark reordering.\n>\n> Specifically, <abbr title=\"Arabic Mark Transient Reordering Algorithm\">AMTRA</abbr> is designated as an operation performed during\n> text rendering only, which therefore does not impact other\n> Unicode-compliance issues such as allowable input sequences or text\n> encoding.\n>\n> However, shaping engines may choose to perform the reordering of\n> modifier combining marks in conjunction with their Unicode\n> normalization functionality for increased efficiency.\n\n### Stage 2: Compound character composition and decomposition ###\n\nThe `ccmp` feature allows a font to substitute\n\n - mark-and-base sequences with a pre-composed glyph including both\n    the mark and the base (as is done in with a ligature substitution)\n\t\n  - individual compound glyphs with the equivalent sequence of\n    decomposed glyphs (such as decomposing a letter with Majlīyānā or\n    other marks into a separate fundamental-letter glyph followed by a\n    mark-only glyph, to permit more precise positioning)\n \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n:::{figure-md}\n![`ccmp` feature application](/images/syriac/syriac-ccmp.svg \"`ccmp` feature application\"){.shaping-demo .inline-svg .greyscale-svg #syriac-ccmp}\n\n`ccmp` feature application\n:::\n\n```{svg-color-toggle-button} syriac-ccmp\n```\n\n\n### Stage 3: Computing letter joining states ###\n\nIn order to correctly apply the initial, medial, and final form\nsubstitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr> during stage 6, the shaping engine must\ntag every letter for possible application of the appropriate feature.\n\nTo determine which feature is appropriate, the shaping engine must\nexamine each word in turn and compute each letter's joining state from\nthe letter's `JOINING_TYPE` and the `JOINING_TYPE` of the\npreceding character (if any).\n\n> Note: Although Syriac uses inter-word spaces, the `init` feature\n> does _not_ refer to word-initial letters only and the `fina` feature\n> does _not_ refer to word-final letters only.\n>\n> Rather, both of these terms are defined with respect to whether or\n> not the preceding and subsequent letters form joins with the current\n> letter. The letters at word boundaries will, naturally, take on\n> initial and final forms, but initial and final forms of letters also\n> occur regularly within words, when the letter in question is\n> adjacent to a letter than does not form joins.\n\nThis computation starts from the first letter of the word, temporarily\ntagging the letter for `isol` substitution. If the first\nletter is the only letter in the word, the `isol` tag will remain unchanged.\n\nFrom here, the algorithm consumes each character in the string, one at\na time, keeping track of the JOINING_TYPE of the previous character. \n\nIf the current character is JOINING_TYPE_TRANSPARENT, move on to the next\ncharacter but preserve the currently-tracked JOINING_TYPE at its previous state.\n\nIf the preceding character's JOINING_TYPE is LEFT, DUAL, or\nJOIN_CAUSING:\n  - In `<syrc>` text, if the current character is <samp>\"Alaph\"</samp>, tag the\n    current character for `med2`, then update the tag for the\n    preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n  - If the current character's JOINING_TYPE is RIGHT, DUAL, or\n    JOIN_CAUSING, tag the current character for `fina`, then update\n    the tag for the preceding character:\n\t  - `isol` becomes `init`\n\t  - `fina` becomes `medi`\n\t  - `init` remains `init`\n\t  - `medi` remains `medi`\n\nOtherwise, tag the current character for `isol`.\n\nAfter testing the final character of the word, if the text is in `<syrc>` and\nif the last character that is not JOINING_TYPE_TRANSPARENT or\nJOINING_TYPE_NON_JOINING is <samp>\"Alaph\"</samp>, perform an additional test:\n  - If the preceding character is JOINING_TYPE_LEFT, tag the current character\n    for `fina`\n  - If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current\n    character for `fin3`\n  - Otherwise, tag the current character for `fin2`\n\n\nOnce the last character of the word has been processed, proceed to the\nnext word and repeat the algorithm, starting at the beginning of the\nnext word.\n\n> Note: Because the processing of the characters in the algorithm\n> described above is deterministic, shaping engines may choose to\n> implement the joining-state computation as a state machine, in a lookup\n> table, or by any other means desirable.\n\nAt the end of this process, all letters should be tagged for possible\nsubstitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or\n`fin3` features.\n\n### Stage 4: Applying the `stch` feature ###\n\nThe `stch` feature decomposes and stretches special marks that are\nmeant to extend to the full width of words to which they are\nattached. It was defined for use in `<syrc>` text runs for the <samp>\"Syriac\nAbbreviation Mark\"</samp> (`U+070F`) but it can be used with similar marks in\nother scripts.\n\nTo apply the `stch` feature, the shaping engine should first decompose the\n`U+070F` glyph into components, which results in a beginning point,\nmidpoint, and endpoint glyphs plus one (or more) extension glyphs: at\nleast one extension between the beginning and midpoint glyphs and at\nleast one extension between the midpoint and endpoint glyphs. \n\nThe shaping engine must then calculate the total length of the word to\nwhich the mark applies. That length, minus the advance widths of the\nbeginning, middle, and endpoint glyphs of the mark, must be divided by\ntwo. \n\nThe result, divided by the advance width of the extension glyph\nand rounded up to the next integer, tells the shaping engine how many\ncopies of the extension glyph must be placed between the midpoint and\neach end of the mark.\n\nFollowing this procedure ensures that the same number of extensions is\nused on each side of the mark so that it remains symmetrical.\n\nFinally, the decomposed mark must be reordered as follows: \n\n  - All of the glyphs in the sequence for the mark, _except_ for\n    the final glyph, are repositioned as a group so that they precede\n    the word to which the mark is attached.\n  - The final glyph in the mark sequence is repositioned to the end of\n    the word.\n\n:::{figure-md}\n![Application of Syriac Abbreviation Mark stretching feature](/images/syriac/syriac-stch.svg \"Application of Syriac Abbreviation Mark stretching feature\"){.shaping-demo .inline-svg .greyscale-svg #syriac-stch}\n\nApplication of Syriac Abbreviation Mark stretching feature\n:::\n\n```{svg-color-toggle-button} syriac-stch\n```\n\n\n### Stage 5: Applying the language-form substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution phase applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for\nthis stage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n\tlocl\n\tisol\n\tfina\n\tfin2\n\tfin3\n\tmedi\n\tmed2\n\tinit\n\trlig\n\trclt (not used in Syriac)\n\tcalt\n\t\n\n#### Stage 5, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n<!--- ![Localized form substitution](/images/syriac/syriac-locl.svg) --->\n\n\n#### Stage 5, step 2: isol ####\n\nThe `isol` feature substitutes the default glyph for a codepoint with\nthe isolated form of the letter.\n\n> Note: It is common for a font to use the isolated form of a letter\n> as the default, in which case the `isol` feature would apply no\n> substitutions. However, this is only a convention, and the active\n> font may use other forms as the default glyphs for any or all\n> codepoints.\n\n<!--- ![Isolated form substitution](/images/syriac/syriac-isol.svg) --->\n\n\n#### Stage 5, step 3: fina ####\n\nThe `fina` feature substitutes the default glyph for a codepoint with\nthe terminal (or final) form of the letter.\n\n:::{figure-md}\n![Final form substitution](/images/syriac/syriac-fina.svg \"Final form substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-fina}\n\nFinal form substitution\n:::\n\n```{svg-color-toggle-button} syriac-fina\n```\n\n\n#### Stage 5, step 4: fin2 ####\n\nThe `fin2` feature replaces word-final Alaph glyph that are not\npreceded by Dalath, Rish, or dotless Dalath-Rish with a special\nterminal-form Alaph glyph.\n\n:::{figure-md}\n![Final form-2 substitution](/images/syriac/syriac-fin2.svg \"Final form-2 substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-fin2}\n\nFinal form-2 substitution\n:::\n\n```{svg-color-toggle-button} syriac-fin2\n```\n\n\n#### Stage 5, step 5: fin3 ####\n\nThe `fin3` feature replaces word-final Alaph glyph that are \npreceded by Dalath, Rish, or dotless Dalath-Rish with a special\nterminal-form Alaph glyph.\n\n:::{figure-md}\n![Final form-3 substitution](/images/syriac/syriac-fin3.svg \"Final form-3 substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-fin3}\n\nFinal form-3 substitution\n:::\n\n```{svg-color-toggle-button} syriac-fin3\n```\n\n\n#### Stage 5, step 6: medi ####\n\nThe `medi` feature substitutes the default glyph for a codepoint with\nthe medial form of the letter.\n\n:::{figure-md}\n![Medial form substitution](/images/syriac/syriac-medi.svg \"Medial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-medi}\n\nMedial form substitution\n:::\n\n```{svg-color-toggle-button} syriac-medi\n```\n\n\n#### Stage 5, step 7: med2 ####\n\nThe `med2` feature replaces Alaph glyphs in the middle of a\nword that are preceded by a base character which can form a right-side\njoin with a special medial-form Alaph glyph.\n\n:::{figure-md}\n![Medial form-2 substitution](/images/syriac/syriac-med2.svg \"Medial form-2 substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-med2}\n\nMedial form-2 substitution\n:::\n\n```{svg-color-toggle-button} syriac-med2\n```\n\n\n#### Stage 5, step 8: init ####\n\nThe `init` feature substitutes the default glyph for a codepoint with\nthe initial form of the letter.\n\n:::{figure-md}\n![Initial form substitution](/images/syriac/syriac-init.svg \"Initial form substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-init}\n\nInitial form substitution\n:::\n\n```{svg-color-toggle-button} syriac-init\n```\n\n\n#### Stage 5, step 9: rlig ####\n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Required ligature substitution](/images/syriac/syriac-rlig.svg \"Required ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-rlig}\n\nRequired ligature substitution\n:::\n\n```{svg-color-toggle-button} syriac-rlig\n```\n\n\n#### Stage 5, step 10: rclt ####\n\nThis feature is not used in `<syrc>` text.\n\n\n#### Stage 5, step 11: calt ####\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nconnecting glyph with an alternate that provides a preferable\nconnection to an adjacent glyph.\n\nThe substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\n:::{figure-md}\n![Contextual alternate substitution](/images/syriac/syriac-calt.svg \"Contextual alternate substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-calt}\n\nContextual alternate substitution\n:::\n\n```{svg-color-toggle-button} syriac-calt\n```\n\n\n### Stage 6: Applying the typographic-form substitution features from <abbr>GSUB</abbr> ###\n\nThe typographic-substitution phase applies optional substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table.\n\nThe order in which these substitutions must be performed is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    liga\n\tdlig\n\tcswh (not used in Syriac)\n\tmset (not used in Syriac)\n\t\n\n#### Stage 6, step 1: liga ####\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Standard ligature substitution](/images/syriac/syriac-liga.svg \"Standard ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-liga}\n\nStandard ligature substitution\n:::\n\n```{svg-color-toggle-button} syriac-liga\n```\n\n\n#### Stage 6, step 2: dlig ####\n\nThe `dlig` feature substitutes additional optional ligatures that are\noff by default. Substitutions made by `dlig` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Discretionary ligature substitution](/images/syriac/syriac-dlig.svg \"Discretionary ligature substitution\"){.shaping-demo .inline-svg .greyscale-svg #syriac-dlig}\n\nDiscretionary ligature substitution\n:::\n\n```{svg-color-toggle-button} syriac-dlig\n```\n\n\n#### Stage 6, step 3: cswh ####\n\nThis feature is not used in `<syrc>` text.\n\n\n#### Stage 6, step 4: mset ####\n\nThis feature is not used in `<syrc>` text.\n\n\n### Stage 7: Applying the positioning features from <abbr>GPOS</abbr> ###\n\nThe positioning stage adjusts the positions of mark and base\nglyphs.\n\nThe order in which these features are applied is fixed for\nall scripts implemented in the Arabic shaping model:\n\n    curs (not used in Syriac)\n\tkern\n\tmark\n\tmkmk\n\n#### Stage 7, step 1: curs ####\n\nThis feature is not used in `<syrc>` text.\n\n\n#### Stage 7, step 2: kern ####\n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\n:::{figure-md}\n![Kerning positioning](/images/syriac/syriac-kern.svg \"Kerning positioning\"){.shaping-demo .inline-svg .greyscale-svg #syriac-kern}\n\nKerning positioning\n:::\n\n```{svg-color-toggle-button} syriac-kern\n```\n\n\n#### Stage 7, step 3: mark ####\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Mark positioning](/images/syriac/syriac-mark.svg \"Mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #syriac-mark}\n\nMark positioning\n:::\n\n```{svg-color-toggle-button} syriac-mark\n```\n\n\n\n#### Stage 7, step 4: mkmk ####\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n:::{figure-md}\n![Mark-to-mark positioning](/images/syriac/syriac-mkmk.svg \"Mark-to-mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #syriac-mkmk}\n\nMark-to-mark positioning\n:::\n\n```{svg-color-toggle-button} syriac-mkmk\n```\n\n\n"
  },
  {
    "path": "opentype-shaping-tamil.md",
    "content": "```{include} /_global.md\n```\n\n# Tamil shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Tamil script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Tamil character tables](#tamil-character-tables)\n  - [The `<tml2>` shaping model](#the-tml2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<taml>` shaping model](#the-taml-shaping-model)\n      - [Distinctions from `<tml2>`](#distinctions-from-tml2)\n      - [Advice for handling fonts with `<taml>` features only](#advice-for-handling-fonts-with-taml-features-only)\n      - [Advice for handling text runs composed in `<taml>` format](#advice-for-handling-text-runs-composed-in-taml-format)\n\n\n## General information ##\n\nThe Tamil script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the South Indic subgroup.\n\nThe Tamil script is used to write multiple languages, most commonly\nTamil, Irula, and Saurashtra. In addition, Sanskrit may be written\nin Tamil, so Tamil script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Tamil script tags defined in OpenType, `<taml>`\nand `<tml2>`. The older script tag, `<taml>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<tml2>`\nshaping model. However, if a font is encountered that supports only\n`<taml>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\n**Halant** and **Virama** are both standard terms for the above-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Tamil\nlanguage, this sign is known as _pulli_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. Tamil does\nnot include a \"chandrabindu\" character, but the term is still found in\nmultiple places in OpenType shaping documents.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Tamil text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark.\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nTamil glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nTamil includes one special class of letter, `MODIFYING_LETTER`, which\nis used only for \"Visarga\" (`U+0B83`). This denotes the character's\nusage in the Tamil language, which treats \"Visarga\" differently than\nother Indic scripts. In older Tamil texts, \"Visarga\" may indicate the\npresence of a silent letter; in recent Tamil texts, \"Visarga\" is used\nto modify the following letter in order to denote a foreign phoneme,\nsuch as \"f\". In shaping, \"Visarga\" should match tests for letters, but\nit is neither a consonant nor a vowel.\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Tamil character tables ###\n\nSeparate character tables are provided for the Tamil, Tamil\nSupplement, Grantha marks, and Vedic Extensions block as well as for\nother miscellaneous characters that are used in `<tml2>` text runs:\n\n  - [Tamil character table](character-tables/character-tables-tamil.md#tamil-character-table)\n  - [Tamil Supplement character table](character-tables/character-tables-tamil.md#tamil-supplement-character-table)\n  - [Grantha marks character table](character-tables/character-tables-tamil.md#grantha-marks-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-tamil.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-tamil.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0B82`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0B82; Anusvara            |\n| | | | |\n|`U+0B95`   | Letter           | CONSONANT         | _null_                     | &#x0B95; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are designated as _unassigned_ in\nthe _Unicode category_ column.\n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\nIn addition to the marks in the Tamil Unicode block, Tamil text can\nalso include several diacritical marks from the Grantha Unicode block,\nsuch as Grantha Candrabindu (`U+11301`), Grantha Visarga (`U+11303`),\nand Grantha Nukta (`U+1133C`).\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Tamil text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>.\n\n  - An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\n    even where an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width\n    joiner would otherwise produce a <samp>\"Reph\"</samp>.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nTamil text sometimes uses the Latin numerals 2, 3, and 4 in\nsuperscript or subscript positions to annotate Sanskrit. When used in\nthis fashion, the superscripts and subscripts are treated as\n`SYLLABLE_MODIFIER` signs for shaping purposes.\n\n\n\n## The `<tml2>` shaping model ##\n\nProcessing a run of `<tml2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Tamil's specific shaping\ncharacteristics include: \n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any special final-consonant forms.\n\n  - `REPH_POS_AFTER_POST` = <samp>\"Reph\"</samp> is ordered after all post-base consonant forms.\n\n  - `REPH_MODE_IMPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant\"</samp> sequence.\n\n  - `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to\n     pre-base consonants and to post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are\n     ordered after subjoined (i.e., below-base) consonant forms. \n\n  - `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are\n     ordered after all post-base consonant forms. \n\n  - `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are\n     ordered after all post-base consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Tamil consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Tamil Unicode block enumerates one modifier sign,\n> \"Anusvara\" (`U+0B82`). Tamil text can also include several modifier\n> signs from the Grantha Unicode block, such as Grantha Candrabindu\n> (`U+11301`), Grantha Visarga (`U+11303`), and Grantha Nukta\n> (`U+1133C`).In addition, Sanskrit text written in Tamil \n> may include additional signs from Vedic Extensions block. \n>\n> Note: Unlike many other Indic scripts, the Tamil Unicode block\n> categorizes \"Visarga\" (`U+0B83`) as a letter, not as a modifier sign.\n\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nGenerally speaking, the base consonant is the final consonant of the\nsyllable and its vowel sound designates the end of the syllable. This\nrule is synonymous with the `BASE_POS_LAST` characteristic mentioned\nearlier. \n\nValid consonant-based syllables may include one or more additional \nconsonants that precede the base consonant. Each of these\nother, pre-base consonants will be followed by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nUnlike many other Indic scripts, the consonant <samp>\"Ra\"</samp> does not receive special\ntreatment; <samp>\"Ra,Halant\"</samp> sequences are not replaced with <samp>\"Reph\"</samp>.\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Tamil script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Tamil syllables.\n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs.\n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nTamil. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant\"</samp> and the syllable contains\n    more than one consonant, exclude the starting <samp>\"Ra\"</samp> from the list of\n    consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\n> Note: The algorithm is designed to work for all Indic\n> scripts. However, Tamil does not utilize pre-base-reordering <samp>\"Ra\"</samp>.\n\nTamil does not usually incorporate post-base or below-base\nconsonant forms. However, it is possible for a font to incorporate\nthem for typographic variation.\n\n> Note: Because Tamil employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search, only the <samp>\"Halant,_consonant_\"</samp> \n> pattern following the syllable base for these below-base forms will\n> be encountered. Stage 2, step 5 below ensures that the <samp>\"_consonant_,Halant\"</samp>\n> pattern preceding the syllable base for these below-base forms will\n> also be tagged correctly.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their components. Tamil has three multi-part dependent vowels,\n\"O\" (`U+0BCA`), \"Oo\" (`U+0BCB`), and \"Au\" (`U+0BCC`). Each\nhas a canonical decomposition, so this step is unambiguous. \n\n\n> \"O\" (`U+0BCA`) decomposes to \"`U+0BC6`,`U+0BBE`\"\n>\n> \"Oo\" (`U+0BCB`) decomposes to \"`U+0BC7`,`U+0BBE`\"\n> \n> \"Au\" (`U+0BCC`) decomposes to \"`U+0BC6`,`U+0BD7`\"\n\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/tamil/tamil-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #tamil-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} tamil-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all left-side dependent-vowel (matra) signs must be tagged to be\nmoved to the beginning of the syllable, with `POS_PREBASE_MATRA`.\n\nAbove-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_SUBJOINED`.\n\nRight-side dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.\n\nBelow-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n> Note: The Tamil Unicode block does not include a \"Nukta\"\n> codepoint. However, Tamil text may include \"Grantha Nukta\" (`U+1133C`)\n> and other modifier signs from the Grantha Unicode block.\n>\n> In addition, `<tml2>` text runs in minority languages that\n> use the Tamil script may incorporate nukta characters from other\n> blocks. Therefore shaping engines must apply the appropriate\n> mark-reordering move if a character matching the NUKTA shaping class\n> is encountered.\n\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nTamil does not usually incorporate post-base or below-base\nconsonant forms. However, it is possible for a font to incorporate\nthem for typographic variation.\n\n> Note: Because Tamil employs the `BLWF_MODE_PRE_AND_POST` shaping\n> characteristic, consonants with below-base special forms may occur\n> before or after the syllable base. \n> \n> During the base-consonant search in stage 2, step 1, any instances of the\n> <samp>\"Halant,_consonant_\"</samp>  pattern following the syllable base for these\n> below-base forms will be encountered. The tagging in this step\n> ensures that the <samp>\"_consonant_,Halant\"</samp> pattern preceding the syllable\n> base for these below-base forms will also be tagged correctly.\n\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\nTamil does not use <samp>\"Reph\"</samp>, so this step will\ninvolve no work when shaping `<tml2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<bng2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp>\n  - `pref` should match <samp>\"_Consonant_,Halant\"</samp> sequences in\n            pre-base position but _not_ match <samp>\"Ra,Halant\"</samp> sequences\n            flagged for `rphf`\n  - `blwf` should match <samp>\"Halant,_Consonant_\"</samp> in\n            post-base positions and <samp>\"_Consonant_,Halant\"</samp> in\n            non-initial pre-base positions \n  - `abvf` should match initial <samp>\"_Consonant_,Halant\"</samp> sequences but _not_ match\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"Ra,Halant\"</samp> sequences flagged for `rphf` and\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base position\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt \n\takhn\n\trphf (not used in Tamil) \n\trkrf (not used in Tamil)\n\tpref \n\tblwf \n\tabvf \n\thalf\n\tpstf \n\tvatu (not used in Tamil)\n\tcjct \n\tcfar (not used in Tamil)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n> Note: The Tamil Unicode block does not include a \"Nukta\"\n> codepoint. However, Tamil text may include \"Grantha Nukta\" (`U+1133C`)\n> from the Grantha Unicode block.\n>\n> In addition, `<tml2>` text runs in minority languages that\n> use the Tamil script may incorporate nukta characters from other\n> blocks. Therefore shaping engines must apply the `nukt` feature if\n> it is used in the active font.\n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces one specific sequence with a required ligature. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The <samp>\"KSsa\"</samp> \ncharacter has orthographic status equivalent to full\nconsonants in some languages, and fonts may have `cjct` substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![Akhand KSsa formation](/images/tamil/tamil-akhn-kssa.svg \"Akhand KSsa formation\"){.shaping-demo .inline-svg .greyscale-svg #tamil-akhn-kssa}\n\nAkhand KSsa formation\n:::\n\n```{svg-color-toggle-button} tamil-akhn-kssa\n```\n\n\n#### Stage 3, step 4: rphf ####\n\n> This feature is not used in Tamil.\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence       | Lookahead     |\n|:-----------------|:------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`   | _none_        |\n:::\n\n\t\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Tamil.\n\n\n#### Stage 3, step 6: pref ####\n\nThe `pref` feature replaces pre-base-reordering consonant glyphs with\nany special forms.\n\nThe substitution of the nominal glyph for its special form takes place\nat this stage. However, the actual reordering move is performed later,\nin stage 4, step 4.\n\n> Note: Tamil does not usually incorporate pre-base-consonant forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. \n\n> Note: Tamil does not usually incorporate below-base-consonant forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\nBecause Tamil incorporates the `BLWF_MODE_PRE_AND_POST` shaping\ncharacteristic, any pre-base consonants and any post-base consonants\nmay potentially match a `blwf` substitution; therefore, both cases must\nbe flagged for comparison. Note that this is not necessarily the case in other\nIndic scripts that use a different `BLWF_MODE_` shaping\ncharacteristic. \n\n\n#### Stage 3, step 8: abvf ####\n\nThe `abvf` feature replaces above-base-consonant glyphs with any\nspecial forms. \n\n> Note: Tamil does not usually incorporate above-base-consonant forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n<samp>\"_Consonant_\"</samp>.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are two exceptions to the default behavior, for which the\nshaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n\n> Note: Tamil does not usually incorporate half forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation. For example, a font may substitute a\n> ligature of the <samp>\"_Consonant_\"</samp> and <samp>\"Halant\"</samp> glyphs.\n\n:::{figure-md}\n![half-form feature application](/images/tamil/tamil-half.svg \"half-form feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-half}\n\nhalf-form feature application\n:::\n\n```{svg-color-toggle-button} tamil-half\n```\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any\nspecial forms. \n\n> Note: Tamil does not usually incorporate post-base-consonant forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\n#### Stage 3, step 11: vatu ####\n\n> This feature is not used in Tamil.\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<tml2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n\n> Note: Tamil does not usually incorporate conjunct forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n\n:::{figure-md}\n![Conjunct formation](/images/tamil/tamil-cjct.svg \"Conjunct formation\"){.shaping-demo .inline-svg .greyscale-svg #tamil-cjct}\n\nConjunct formation\n:::\n\n```{svg-color-toggle-button} tamil-cjct\n```\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Tamil.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n\n   - after any ligature glyphs that resulted from the substitution of\n     a <samp>\"_Consonant_,Halant,ZWJ\"</samp> subsequence\n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences and all glyphs that resulted from a\nsubstitution on a <samp>\"_Consonant_,Halant,ZWJ\"</samp> subsequence, but will stop\nto the left of the base consonant or syllable base, and all conjuncts\nor ligatures that contain the base consonant or syllable base.\n\n:::{figure-md}\n![Pre-base matra positioning](/images/tamil/tamil-matra-position.svg \"Pre-base matra positioning\"){.shaping-demo .inline-svg .greyscale-svg #tamil-matra-position}\n\nPre-base matra positioning\n:::\n\n```{svg-color-toggle-button} tamil-matra-position\n```\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Tamil incorporates the `REPH_POS_AFTER_POST`\nshaping characteristic, this final position is immediately after\nany post-base consonant forms.\n\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Tamil incorporates the\n    > `REPH_POS_AFTER_POST` shaping characteristic, this means\n    > any positioning tag of `POS_FINAL_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_FINAL_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nTamil does not use <samp>\"Reph\"</samp>, so this step will involve no work when\nprocessing `<tml2>` text. It is included here in order to maintain\ncompatibility with the other Indic scripts. \n\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to immediately before\nthe base consonant or syllable base.\n  \nTamil does not use pre-base-reordering consonants, so this step will\ninvolve no work when processing `<tml2>` text. It is included here in order\nto maintain compatibility with the other Indic scripts.\n  \n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nTamil does not use the `init` feature, so this step will\ninvolve no work when processing `<tml2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Tamil)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Tamil.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\n:::{figure-md}\n![pres feature application](/images/tamil/tamil-pres.svg \"pres feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-pres}\n\npres feature application\n:::\n\n```{svg-color-toggle-button} tamil-pres\n```\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![abvs feature application](/images/tamil/tamil-abvs.svg \"abvs feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-abvs}\n\nabvs feature application\n:::\n\n```{svg-color-toggle-button} tamil-abvs\n```\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing base consonants or\nsyllable bases that\nare adjacent to the below-base marks with contextually appropriate\nligatures.\n\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![psts feature application](/images/tamil/tamil-psts.svg \"psts feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-psts}\n\npsts feature application\n:::\n\n```{svg-color-toggle-button} tamil-psts\n```\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![haln feature application](/images/tamil/tamil-haln.svg \"haln feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-haln}\n\nhaln feature application\n:::\n\n```{svg-color-toggle-button} tamil-haln\n```\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Tamil shaping\n> and may be disabled in the application by user preference.\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Tamil text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Distance application](/images/tamil/tamil-dist.svg \"Distance application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-dist}\n\nDistance application\n:::\n\n```{svg-color-toggle-button} tamil-dist\n```\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Tamil, this includes above-base dependent vowels\n(matras), diacritical marks, and Vedic signs.\n\n:::{figure-md}\n![abvm feature application](/images/tamil/tamil-abvm.svg \"abvm feature application\"){.shaping-demo .inline-svg .greyscale-svg #tamil-abvm}\n\nabvm feature application\n:::\n\n```{svg-color-toggle-button} tamil-abvm\n```\n\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Tamil, this includes below-base diacritical marks.\n\n\n\n## The `<taml>` shaping model ##\n\nThe older Tamil script tag, `<taml>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<taml>` and some users may still have documents that were written to\ntake advantage of `<taml>` shaping.\n\n### Distinctions from `<tml2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures) was altered when migrating from `<taml>` to\n`<tml2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<taml>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<tml2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nThe old Indic shaping model also did not recognize the\n`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, `<taml>`\nwas treated as if it followed the `BLWF_MODE_POST_ONLY`\ncharacteristic. In other words, below-base form substitutions were\nonly applied to consonants after the base consonant or syllable base.\n\n\n### Advice for handling fonts with `<taml>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<taml>` shaping model.\n\n### Advice for handling text runs composed in `<taml>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<taml>` script tag and it is known that the font in use supports\nonly the `<tml2>` shaping model.\n\nShaping engines may also choose to apply `blwf` substitutions to\nbelow-base consonants occurring before the base consonant or syllable base when it is\nknown that the font in use supports an applicable substitution lookup.\n\nShaping engines may also choose to position left-side matras according\nto the `<taml>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n\n"
  },
  {
    "path": "opentype-shaping-telugu.md",
    "content": "```{include} /_global.md\n```\n\n# Telugu shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Telugu script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Telugu character tables](#telugu-character-tables)\n  - [The `<tel2>` shaping model](#the-tel2-shaping-model)\n      - [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)\n      - [Stage 2: Initial reordering](#stage-2-initial-reordering)\n      - [Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr>](#stage-3-applying-the-basic-substitution-features-from-gsub)\n      - [Stage 4: Final reordering](#stage-4-final-reordering)\n      - [Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr>](#stage-5-applying-all-remaining-substitution-features-from-gsub)\n      - [Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-6-applying-remaining-positioning-features-from-gpos)\n  - [The `<telu>` shaping model](#the-telu-shaping-model)\n      - [Distinctions from `<tel2>`](#distinctions-from-tel2)\n      - [Advice for handling fonts with `<telu>` features only](#advice-for-handling-fonts-with-telu-features-only)\n      - [Advice for handling text runs composed in `<telu>` format](#advice-for-handling-text-runs-composed-in-telu-format)\n\n\n## General information ##\n\nThe Telugu script belongs to the Indic family, and follows\nthe same general patterns as the other Indic scripts. More\nspecifically, it belongs to the South Indic subgroup, in which\nsequences of adjacent consonants are often represented as below-base forms.\n\nThe Telugu script is used to write multiple languages, most commonly\nTelugu and Gondi. In addition, Sanskrit may be written\nin Telugu, so Telugu script runs may include glyphs from the Vedic\nExtensions block of Unicode. \n\nThere are two extant Telugu script tags defined in OpenType, `<telu>`\nand `<tel2>`. The older script tag, `<telu>`, was deprecated in 2005.\nTherefore, new fonts should be engineered to work with the `<tel2>`\nshaping model. However, if a font is encountered that supports only\n`<telu>`, the shaping engine should deal with it gracefully.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Indic scripts.  The\nterms used colloquially in any particular language may vary, however,\npotentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. \n\nThe term \"matra\" is also used to refer to the headline in other Indic\nscripts, and may be used to describe the distinctive up-tick stroke above most\nTelugu letters by comparison. To avoid ambiguity, the term **headline** is\nused in most Unicode and OpenType shaping documents.\n\n**Halant** and **Virama** are both standard terms for the below-base \"vowel-killer\"\nsign. Unicode documents use the term \"virama\" most frequently, while\nOpenType documents use the term \"halant\" most frequently. In the Telugu\nlanguage, this sign is known as the _halantamu_.\n\n**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark\nindicating that the preceding vowel should be nasalized. In the Telugu\nlanguage, this mark is known as the _candrabindu_.\n\nThe term **base consonant** is also critical to Indic shaping. The\nbase consonant of a syllable is the consonant that carries the\nsyllable's vowel sound, either the inherent vowel (for an unmarked\nbase consonant) or a dependent vowel (with the addition of a matra).\n\nA syllable's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\nsyllable frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nSyllables may also begin with an **independent vowel** instead of a\nconsonant. In these syllables, the independent vowel is rendered in\nfull-letter form, not as a matra, and the independent vowel serves as the\nsyllable base, similar to a base consonant.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Telugu text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nTelugu glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Indic scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU`, `VISARGA`, `AVAGRAHA`, and `VIRAMA`. Some\nless-common marks behave according to rules that are similar to these\ncommon marks, and are therefore classified with the corresponding\ncommon mark. The Vedic Extensions also include a `CANTILLATION`\nclass for tone marks.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (some of which are also categorized as `Mark [Mn]`).\n\n\nOther characters, such as symbols and miscellaneous letters (for\nexample, letter-like symbols that only occur as standalone entities\nand do not occur within syllables), need no special attention from the\nshaping engine, so they are not assigned a shaping class.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks and dependent vowels are further labeled with a mark-placement\nsubclass, which indicates where the glyph will be placed with respect\nto the base character to which it is attached. The actual position of\nthe glyphs is determined by the lookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr>\ntable, however, the shaping rules for Indic scripts require that the\nshaping engine be able to identify marks by their general\nposition. \n\nFor example, left-side dependent vowels (matras), classified\nwith `LEFT_POSITION`, must frequently be reordered, with the final\nposition determined by whether or not other letters in the syllable\nhave formed ligatures or combined into conjunct forms. Therefore, the\n`LEFT_POSITION` subclass of the character must be tracked throughout\nthe shaping process.\n\nThere are four basic _mark-placement subclasses_ for dependent vowels\n(matras). Each corresponds to the visual position of the matra with\nrespect to the syllable base to which it is attached:\n\n  - `LEFT_POSITION` matras are positioned to the left of the syllable base.\n  - `RIGHT_POSITION` matras are positioned to the right of the syllable base.\n  - `TOP_POSITION` matras are positioned above the syllable base.\n  - `BOTTOM_POSITION` matras are positioned below syllable base.\n  \nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Pre-base_ matras\n  - _Post-base_ matras\n  - _Above-base_ matras\n  - _Below-base_ matras\n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nIn addition, dependent-vowel codepoints that are composed of multiple\ncomponents will be designated in character tables as having a compound\n_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`. \n\nHowever, these multi-part matras are decomposed into separate matra\ncomponents during the shaping process. After the decomposition, each\nmatra component will belong to exactly one of the four basic\n_mark-placement subclasses_.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n### Telugu character tables ###\n\nSeparate character tables are provided for the Telugu and Vedic\nExtensions blocks as well as for other miscellaneous characters that\nare used in `<tel2>` text runs:\n\n  - [Telugu character table](character-tables/character-tables-telugu.md#telugu-character-table)\n  - [Vedic Extensions character table](character-tables/character-tables-telugu.md#vedic-extensions-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-telugu.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0C01`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0C01; Candrabindu         |\n| | | | |\n|`U+0C15`   | Letter           | CONSONANT         | _null_                     | &#x0C15; Ka                  |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\n\n#### Special-function codepoints ####\n\nOther important characters that may be encountered when shaping runs\nof Telugu text include the dotted-circle placeholder (`U+25CC`), the\nzero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and\nthe no-break space (`U+00A0`).\n\nEach of these is of particular importance to shaping engines, because\nthese codepoints interact with the shaping engine, the text run, and\nthe active font, either to mediate non-default shaping behavior or to\nrelay information about the current shaping process.\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\nDotted-circle placeholder characters (like any Unicode codepoint) can\nappear anywhere in text input sequences and should be rendered\nnormally. <abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning lookups should attach mark glyphs to dotted\ncircles as they would to other non-mark characters. As visible glyphs,\ndotted circles can also be involved in <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions.\n\nIn addition to the default input-text handling process, shaping\nengines may also insert dotted-circle placeholders into the text\nsequence. Dotted-circle insertions are required when a non-spacing\nmark or dependent sign is formed with no base character present.\n\nThis requirement covers:\n\n  - Dependent signs that are assigned their own individual Unicode\n    codepoints (such as most dependent-vowel marks or matras)\n  \n  - Dependent signs that are formed only by specific sequences of\n    other codepoints (such as <samp>\"Reph\"</samp>)\n\n\nThe zero-width joiner (<abbr>ZWJ</abbr>) is primarily used to prevent the formation\nof a conjunct from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the\n    formation of a conjunct between the two consonants. \n\nNote, however, that the <samp>\"_Consonant_,Halant\"</samp> subsequence in the above\nexample may still trigger a half-forms feature. To prevent the\napplication of the half-forms feature in addition to preventing the\nconjunct, the zero-width non-joiner (<abbr>ZWNJ</abbr>) must be used instead.\n\n  - The sequence <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> should produce\n    the first consonant in its standard form, followed by an explicit\n    <samp>\"Halant\"</samp>. \n\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp> in some scripts, or to explicitly request a <samp>\"Reph\"</samp> form in\nother scripts.\n\n  - In Telugu, the default behavior for a syllable beginning with\n    <samp>\"Ra,Halant\"</samp> is for the <samp>\"Ra\"</samp> to be displayed in full form. An\n    explicit <samp>\"Ra,Halant,ZWJ\"</samp> sequence is required to produce a <samp>\"Reph\"</samp>\n    instead of this default behavior.\n\t\n  - In Telugu, a <samp>\"Ra,ZWJ,Halant\"</samp> sequence will prevent the formation\n    of a <samp>\"Reph\"</samp> form.\n\nThe <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> characters are, by definition, non-printing control\ncharacters and have the _Default_Ignorable_ property in the Unicode\nCharacter Database. In standard text-display scenarios, their function\nis to signal a request from the user to the shaping engine for some\nparticular non-default behavior. As such, they are not rendered\nvisually.\n\n> Note: Naturally, there are special circumstances where a user or\n> document might need to request that a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> or <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> be rendered\n> visually, such as when illustrating the OpenType shaping process, or\n> displaying Unicode tables.\n\nBecause the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are non-printing control characters, they can\nbe ignored by any portion of a software text-handling stack not\ninvolved in the shaping operations that the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> are designed\nto interface with. For example, spell-checking or collation functions\nwill typically ignore <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr>.\n\nSimilarly, the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> and <abbr title=\"Zero-Width Non Joiner\">ZWNJ</abbr> should be ignored by the shaping engine\nwhen matching sequences of codepoints against the backtrack and\nlookahead sequences of a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> lookups.\n\nFor example:\n\n  - A lookup that substitutes an alternate version of a\n    dependent-vowel (matra) glyph when it is preceded by <samp>\"Ka,Halant,Tta\"</samp>\n    should still be applied if the dependent-vowel codepoint is preceded\n    by <samp>\"Ka,Halant,ZWJ,Tta\"</samp> in the text run.\n\nThe no-break space (<abbr>NBSP</abbr>) is primarily used to display those\ncodepoints that are defined as non-spacing (marks, dependent vowels\n(matras), below-base consonant forms, and post-base consonant forms)\nin an isolated context, as an alternative to displaying them\nsuperimposed on the dotted-circle placeholder. These sequences will\nmatch <samp>\"NBSP,ZWJ,Halant,_Consonant_\"</samp>, <samp>\"NBSP,_mark_\"</samp>, or <samp>\"NBSP,_matra_\"</samp>.\n\nIn addition to general punctuation, runs of Telugu text often use the\ndanda (`U+0964`) and double danda (`U+0965`) punctuation marks from\nthe Devanagari block.\n\n\n\n## The `<tel2>` shaping model ##\n\nProcessing a run of `<tel2>` text involves six top-level stages:\n\n1. Identifying syllables and other sequences\n2. Initial reordering\n3. Applying the basic substitution features from <abbr>GSUB</abbr>\n4. Final reordering\n5. Applying all remaining substitution features from <abbr>GSUB</abbr>\n6. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Indic scripts, the initial reordering stage and the\nfinal reordering stage each involve applying a set of several\nscript-specific rules. The basic substitution features must be applied\nto the run in a specific order. The remaining substitution features in\nstage five, however, do not have a mandatory order.\n\nIndic scripts follow many of the same shaping patterns, but they\ndiffer in a few critical characteristics that the shaping engine must\ntrack. These include:\n\n  - The position of the base consonant in a syllable.\n  \n  - The final position of <samp>\"Reph\"</samp>.\n  \n  - Whether <samp>\"Reph\"</samp> must be requested explicitly or if it is formed by\n    a specific, implicit sequence.\n\t\n  - Whether the below-base forms feature is applied only to consonants\n    before the syllable base, only to consonants after the base\n    consonant, or to both.\n\t\n  - The ordering positions for dependent vowels\n    (matras). Specifically, right-side, above-base, and below-base\n    matras follow different rules in different scripts. \n\tAll Indic scripts position left-side matras in the same\n    manner, in the ordering position `POS_PREBASE_MATRA`. \n\nWith regard to these common variations, Telugu's specific shaping\ncharacteristics include:\n\n  - `BASE_POS_LAST` = The base consonant of a syllable is the last\n     consonant, not counting any consonants with post-base forms.\n\t \n\t - Telugu differs somewhat from other `BASE_POS_LAST` scripts in\n       that all consonants can use post-base forms. Therefore, the\n       general base-consonant search algorithm should identify the first\n       non-<samp>\"Reph\"</samp> consonant as the base. This is the expected\n       behavior, as it allows the same search algorithm to be used\n       with all `BASE_POS_LAST` scripts.\n\n  - `REPH_POS_AFTER_POST` = <samp>\"Reph\"</samp> is ordered after the last post-base\n     consonant form.\n\n  - `REPH_MODE_EXPLICIT` = <samp>\"Reph\"</samp> is formed by an initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence.\n\n  - `BLWF_MODE_POST_ONLY` = The below-forms feature is applied only to\n     post-base consonants.\n\n  - `MATRA_POS_TOP` = `POS_BEFORE_SUBJOINED`  = Above-base matras are\n    ordered before any subjoined (i.e., below-base) consonant forms.\n\n  - `MATRA_POS_RIGHT` = Telugu includes right-side matras that follow two\n     different reordering rules. \n\t \n\t - Matras \"Sign Vocalic R\" (`0C43`) and \"Sign Vocalic Rr\" (`0C44`),\n       use `POS_AFTER_SUBJOINED` = These right-side matras are ordered\n       after all subjoined (i.e., below-base) consonant forms. \n\t   \n\t - Matras \"Sign U\" (`0C41`) and \"Sign Uu\" (`0C42`) use\n       `POS_BEFORE_SUBJOINED` = These right-side matras are ordered before\n       all subjoined (i.e., below-base) consonant forms.\n\n  - `MATRA_POS_BOTTOM` = `POS_BEFORE_SUBJOINED` = Below-base matras are\n     ordered before the any subjoined (i.e., below-base) consonant forms.\n\nThese characteristics determine how the shaping engine must reorder\ncertain glyphs, how base consonants are determined, and how <samp>\"Reph\"</samp>\nshould be encoded within a run of text.\n\n\n### Stage 1: Identifying syllables and other sequences ###\n\nA syllable in Telugu consists of a valid orthographic sequence\nthat may be followed by a \"tail\" of modifier signs. \n\n> Note: The Telugu Unicode block enumerates five modifier signs,\n> \"Combining Candrabindu Above\" (`U+0C00`), \"Candrabindu\" (`U+0C01`),\n> \"Anusvara\" (`U+0C02`), \"Visarga\" (`U+0C03`), and \"Avagraha\"\n> (`U+0C3D`) In addition, Sanskrit text written in Telugu may include\n> additional signs from Vedic Extensions block. \n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIf the syllable begins with a consonant, then the consonant that\nprovides the vowel sound is referred to as the \"base\" consonant. If\nthe syllable begins with an independent vowel, that independent vowel\nis the syllable's only vowel sound and serves as the \"base\". \n\n> Note: A consonant that is not accompanied by a dependent vowel (matra) sign\n> carries the script's inherent vowel sound. This vowel sound is changed\n> by a dependent vowel (matra) sign following the consonant.\n\nFrom the shaping engine's perspective, the main distinction between a\nsyllable with a base consonant and a syllable with an\nindependent-vowel base is that a syllable with an independent-vowel\nbase is less likely to include additional consonants in special forms\nand less likely to include dependent vowel signs\n(matras). Therefore, in the common case, vowel-based syllables may\ninvolve less reordering, substitution feature applications, and other\nprocessing than consonant-based syllables.\n\nIn some languages and orthographies, vowel-based syllables are\nnot permitted to include additional consonants or matras, and certain\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> substitution features do not occur. However, there are often\nknown exceptions, and real-world text makes no such guarantees. \n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\n\nTelugu uses the `BASE_POS_LAST` characteristic mentioned\nearlier. However, because all consonants in the script can potentially\ntake on post-base consonant forms, the outcome of the shaping\ncharacteristic may be counterintuitive.\n\nGenerally speaking, the base consonant is the first logical consonant of the\nsyllable, which is rendered in full form, and any subsequent\nconsonants are rendered in special post-base forms. \n\nEach of these post-base consonants will be preceded by the <samp>\"Halant\"</samp> mark, which\nindicates that they carry no vowel. They affect pronunciation by\ncombining with the base consonant (e.g., \"_str_\", \"_pl_\") but they\ndo not add a vowel sound.\n\nAs with other Indic scripts, the consonant <samp>\"Ra\"</samp> receives special\ntreatment; in many circumstances it is replaced by a combining\nmark-like form. \n\n  - A <samp>\"Ra,Halant,ZWJ\"</samp> sequence at the beginning of a syllable is replaced\n    with a right-side mark called <samp>\"Reph\"</samp>. This rule is synonymous with the\n    `REPH_MODE_EXPLICIT` characteristic mentioned earlier.\n  - A post-base <samp>\"Ra\"</samp> is reordered to before the base consonant or\n    syllable base during the final-reordering stage of the shaping\n    process.\n\n<samp>\"Reph\"</samp> characters must be reordered after the syllable-identification\nstage is complete.\n\n> Note: Generally speaking, OpenType fonts will implement support for\n> any below-base, post-base, and pre-base-reordering consonant forms\n> by including the necessary substitution rules in their `blwf`,\n> `pstf`, and `pref` lookups in <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n>\n> Consequently, whenever shaping engines need to determine whether or \n> not a given consonant can take on such a special form, the most\n> appropriate test is to check if the consonant is included in the\n> relevant <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookup. Other implementations are possible, such as\n> maintaining static tables of consonants, but checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> support ensures that the expected behavior is implemented in the\n> active font, and is therefore the most reliable approach.\n\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Telugu script, may\n> not adhere to the syllable-formation rules described above. In\n> particular, it is not uncommon to encounter foreign loanwords that\n> contain a word-final suffix of consonants.\n>\n> Nevertheless, such word-final suffixes will be correctly matched by\n> the regular expressions listed below. These loanwords are pronounced\n> different, which raises issues for potential readers, but the\n> character sequences do not affect the shaping process.\n\n\n\nSyllables should be identified by examining the run and matching\nglyphs, based on their categorization, using regular expressions. \n\nThe following general-purpose Indic-shaping regular expressions can be\nused to match Telugu syllables.\n\nThe regular expressions utilize the shaping classes from the tables\nabove. For the purpose of syllable identification, more general\nclasses can be used, as defined in the following table. This\nsimplifies the resulting expressions. \n\n```markdown\n_ra_\t\t= The consonant \"Ra\" \n_consonant_\t= ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_\n_vowel_\t\t= `VOWEL_INDEPENDENT`\n_nukta_\t  \t= `NUKTA`\n_halant_\t= `VIRAMA`\n_zwj_\t\t= `JOINER`\n_zwnj_\t\t= `NON_JOINER`\n_matra_\t\t= `VOWEL_DEPENDENT` | `PURE_KILLER`\n_syllablemodifier_\t= `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`\n_vedicsign_\t= `CANTILLATION`\n_placeholder_\t= `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`\n_dottedcircle_\t= `DOTTED_CIRCLE`\n_repha_\t\t= `CONSONANT_PRE_REPHA`\n_consonantmedial_\t= `CONSONANT_MEDIAL`\n_symbol_\t= `SYMBOL` | `AVAGRAHA`\n_consonantwithstacker_\t= `CONSONANT_WITH_STACKER`\n_other_\t\t= `OTHER` | `MODIFYING_LETTER`\n```\n\n\n> Note: the _ra_ identification class is mutually exclusive with \n> the _consonant_ class. The union of the _consonant_ and _ra_ classes\n> is used in the regular expression elements below in order to\n> correctly identify <samp>\"Ra\"</samp> characters that do not trigger <samp>\"Reph\"</samp> or\n> <samp>\"Rakaar\"</samp> shaping behavior.\n>\n> Note, also, that the cantillation mark \"combining Ra\" in the\n> Devanagari Extended block does _not_ belong to the _ra_\n> identification class, and that the other \"combining consonant\"\n> cantillation marks in the Devanagari Extended block do not belong to\n> the _consonant_ identification class.\n\n> Note: The _placeholder_ identification class includes codepoints\n> that are often used in place of vowels or consonants when a document\n> needs to display a matra, mark, or special form in isolation or\n> in another context beyond a standard syllable. Examples of\n> _placeholder_ codepoints include hyphens and non-breaking\n> spaces. Sequences that utilize this approach should be identified as\n> \"standalone\" syllables.\n>\n> The _placeholder_ identification class also includes numerals, which\n> are commonly used as word substitutes within normal text. Examples\n> include ordinals (e.g., \"4th\").\n\n> Note: The _other_ identification class includes codepoints that\n> do not interact with adjacent characters for shaping purposes. Even\n> though some of these codepoints (such as `MODIFYING_LETTER`) can\n> occur within words, they evoke no behavior from the shaping\n> engine and do not factor into the regular expressions that\n> follow. Therefore, the shaping engine may choose to ignore them\n> during syllable identification; they are listed here for completeness.\n\nThese identification classes form the bases of the following regular\nexpression elements:\n\n```markdown\nC\t= _consonant_ | _ra_\nZ\t= _zwj_ | _zwnj_\nREPH\t= (_ra_ _halant_) | _repha_\nCN\t\t= C _zwj_? _nukta_?\nFORCED_RAKAR\t= _zwj_ _halant_ _zwj_ _ra_\nS\t= _symbol_ _nukta_?\nMATRA_GROUP\t= Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?\nSYLLABLE_TAIL\t= (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}\nHALANT_GROUP\t= Z? _halant_ (_zwj_ _nukta_?)?\nFINAL_HALANT_GROUP\t= HALANT_GROUP | (_halant_ _zwnj_)\nMEDIAL_GROUP\t= _consonantmedial_?\nHALANT_OR_MATRA_GROUP\t= FINAL_HALANT_GROUP | MATRA_GROUP*)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(MATRA_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(MATRA_GROUP){0,4}` .\n\n\nUsing the above elements, the following regular expressions define the\npossible syllable types:\n\nA consonant-based syllable will match the expression:\n```markdown\n(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(CN HALANT_GROUP)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(CN HALANT_GROUP){0,4}` .\n\nA vowel-based syllable will match the expression:\n```markdown\nREPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\nA standalone syllable will match the expression:\n```markdown\n((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n> Note: Although they are labeled as \"standalone syllables\" here,\n> many sequences that match the standalone regular expression above\n> are instances where a document needs to display a matra, combining\n> mark, or special form in isolation. Such sequences might not have\n> any significance with regard to the definition of syllables used in\n> the language or orthography of the text.\n\nA symbol-based syllable will match the expression:\n```markdown\nS SYLLABLE_TAIL\n```\n\nA broken syllable will match the expression:\n```markdown\nREPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than 4 sequential `(HALANT_GROUP CN)`\n> instances in any real-word syllables. Thus, implementations may\n> choose to limit occurrences by limiting the above expressions to a\n> finite length, such as `(HALANT_GROUP CN){0,4}` .\n\n\nThe primary problem involved in shaping broken syllables is the lack\nof a syllable base (either a base consonant or an independent\nvowel). Without a syllable base, the shaping engine cannot perform\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> positioning and other contextual operations that are required\nlater in the shaping process.\n\nTo make up for this limitation, shaping engines should insert a\ndotted-circle placeholder (`U+25CC`) character into the text stream\nwhere the missing syllable base was expected to occur. This\nplaceholder allows the shaping process to proceed on a best-effort\nbasis at handling the broken-syllable sequence, but making guarantees\nabout the orthographic correctness or preferred appearance of the\nfinal result is out of scope for this document.\n\nShaping engines can perform this dotted-circle insertion at any point\nafter the broken syllable has been recognized and before <abbr title=\"Glyph Substitution table\">GSUB</abbr> features\nare applied. However, the best results will likely be attained by\nperforming the insertion immediately, before proceeding to\nstage 2. This will enable the maximum number of <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features\nin the active font to be correctly applied to the text run by ensuring\nthat all reordering, tagging, and sorting algorithms are executed as\nusual.\n\n> Note: In software stacks where other text-handling operations, such\n> as Unicode normalization and localization, are performed before the\n> text run is passed to the shaping engine, there is a potential for\n> the dotted-circle insertion to cause unexpected effects.\n>\n> For example, if a `ccmp` or `locl` feature substitutes the default\n> dotted-circle placeholder glyph with a variant glyph of a different\n> size or weight for the (`U+25CC`) codepoint, then any shaping engine\n> which relies on another software component to handle that\n> functionality must take additional care to ensure consistency.\n\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\n\n\nAfter the syllables have been identified, each of the subsequent \nshaping stages occurs on a per-syllable basis.\n\n### Stage 2: Initial reordering ###\n\nThe initial reordering stage is used to relocate glyphs from the\nphonetic order in which they occur in a run of text to the\northographic order in which they are presented visually.\n\n> Note: Primarily, this means moving dependent-vowel (matra) glyphs, \n> <samp>\"Ra,Halant,ZWJ\"</samp> glyph sequences, and other consonants that take special\n> treatment in some circumstances. \n>\n> These reordering moves are mandatory. The final-reordering stage\n> may make additional moves, depending on the text and on the features\n> implemented in the active font.\n\nThe syllable should be processed by tagging each glyph with its\nintended position based on its ordering category. After all glyphs\nhave been tagged, the entire syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\nThe final sort order of the ordering categories should be:\n\n\n\tPOS_RA_TO_BECOME_REPH\n\tPOS_PREBASE_MATRA\n\tPOS_PREBASE_CONSONANT\n\n\tPOS_SYLLABLE_BASE\n\tPOS_AFTER_MAIN\n\n\tPOS_ABOVEBASE_CONSONANT\n\n\tPOS_BEFORE_SUBJOINED\n\tPOS_BELOWBASE_CONSONANT\n\tPOS_AFTER_SUBJOINED\n\n\tPOS_BEFORE_POST\n\tPOS_POSTBASE_CONSONANT\n\tPOS_AFTER_POST\n\n\tPOS_FINAL_CONSONANT\n\tPOS_SMVD\n\n\nThis sort order enumerates all of the possible final positions to\nwhich a codepoint might be reordered, across all of the Indic\nscripts. It includes some ordering categories not utilized in\nTelugu. \n\nThe basic positions (left to right) are <samp>\"Reph\"</samp>\n(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants\npositioned before the base consonant or syllable base\n(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant\nor syllable base (`POS_SYLLABLE_BASE`), above-base consonants\n(`POS_ABOVEBASE_CONSONANT`), below-base consonants\n(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base\nconsonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final\nconsonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic\nsigns (`POS_SMVD`).\n\nIn addition, several secondary positions are defined to handle various\nreordering rules that deal with relative, rather than absolute,\npositioning. `POS_AFTER_MAIN` means that a character must be\npositioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`\nand `POS_AFTER_SUBJOINED` mean that a character must be positioned\nbefore or after any below-base consonants, respectively. Similarly,\n`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be\npositioned before or after any post-base consonants, respectively. \n\nFor shaping-engine implementers, the names used for the ordering\ncategories matter only in that they are unambiguous. \n\nFor a definition of the \"base\" consonant, refer to stage 2, step 1, which\nfollows.\n\n#### Stage 2, step 1: Base consonant ####\n\nThe first step is to determine the base consonant of the syllable, if\nthere is one, and tag it as `POS_SYLLABLE_BASE`.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base, and it should be tagged\nas `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a standalone sequence or other syllable that begins with a placeholder\nor dotted circle, the placeholder or dotted circle will always serve\nas the syllable base, and it should be tagged as\n`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.\n\nIn a syllable that begins with a consonant, the shaping engine must\ndetermine the base consonant by a script-specific algorithm.\n\n> Note: Shaping engines may choose to treat independent-vowel bases \n> like base consonants for the sake of simplicity or code\n> reuse.\n>\n> However, implementations that take this approach should note\n> that removing the distinction between base consonants and\n> independent-vowel bases entirely may have unintended\n> consequences. Making guarantees about the correctness of the results\n> or about language-specific tests is out of scope for this document.\n\nThe base consonant is defined as the consonant in a consonant-based\nsyllable that carries the syllable's vowel sound. That vowel sound\nwill either be provided by the script's inherent vowel (in which case\nit is not written with a separate character) or the sound will be designated\nby the addition of a dependent-vowel (matra) sign.\n\n\n<!--- > Because vowel-based syllables will not include consonants and\n> because independent vowels do not take on special forms or require\n> reordering, many of the steps that follow will involve no\n> work for a vowel-based syllable. However, vowel-based syllables must\n> still be sorted and their marks handled correctly, and <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\n> lookups must be applied. These steps of the shaping process follow\n> the same rules that are employed for consonant-based syllables.\n--->\n\nWhile performing the base-consonant search, shaping engines may\nalso encounter special-form consonants, including below-base\nconsonants and post-base consonants. Each of these special-form\nconsonants must also be tagged (`POS_BELOWBASE_CONSONANT`,\n`POS_POSTBASE_CONSONANT`, respectively). \n\nAny pre-base-reordering consonant (such as a pre-base-reordering <samp>\"Ra\"</samp>)\nencountered during the base-consonant search must be tagged\n`POS_POSTBASE_CONSONANT`. \n \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\n\nThe algorithm for determining the base consonant is\n\n  - If the syllable starts with <samp>\"Ra,Halant,ZWJ\"</samp>, exclude the starting\n    <samp>\"Ra\"</samp> from the list of consonants to be considered. \n  - Starting from the end of the syllable, move backwards until a consonant is found.\n      * If the consonant is the first consonant, stop.\n      * If the consonant is preceded by the sequence <samp>\"Halant,ZWJ\"</samp>, stop.\n      * If the consonant has a below-base form, tag it as\n        `POS_BELOWBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant has a post-base form, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If the consonant is a pre-base-reordering <samp>\"Ra\"</samp>, tag it as\n        `POS_POSTBASE_CONSONANT`, then move to the previous consonant. \n      * If none of the above conditions is true, stop.\n  - The consonant stopped at will be the base consonant.\n\nTelugu includes a pre-base-reordering <samp>\"Ra\"</samp>.  A <samp>\"Halant,Ra\"</samp> sequence\nafter the base consonant or syllable base will be reordered to a pre-base position\nduring the final-reordering stage.\n\n> Note: It is important to note that all consonants in Telugu have a\n> post-base form, therefore the backwards-search step will\n> automatically move past them until it reaches either a <samp>\"Ra,Halant\"</samp>\n> sequence or the first consonant. However, this condition is not the\n> same as the shaping characteristic `BASE_POS_FIRST`, which does not\n> use the above search algorithm at all.\n\n> Note: Because Telugu employs the `BLWF_MODE_POST_ONLY` shaping\n> characteristic, consonants with below-base special forms will occur\n> only after the base consonant or syllable base. \n> \n> During the base-consonant search, therefore, all of these below-base\n> form sequences will be encountered and tagged correctly as\n> <samp>\"Halant,_consonant_\"</samp> patterns. Stage 2, step 5 below exists to ensure that\n> the <samp>\"_consonant_,Halant\"</samp> pattern preceding the base consonant or syllable base\n> for below-base forms in other Indic scripts will also be tagged correctly.\n\n\n#### Stage 2, step 2: Matra decomposition ####\n\nSecond, any multi-part dependent vowels (matras) must be decomposed\ninto their independent components. Telugu has one\nmulti-part dependent vowel, \"Ai\" (`U+0C48`). It has a canonical\ndecomposition, so this step is unambiguous.\n\n> \"Ai\" (`U+0C48`) decomposes to \"`U+0C46`,`U+0C56`\"\n\nBecause this decomposition is a character-level operation, the shaping\nengine may choose to perform it earlier, such as during an initial\nUnicode-normalization stage. However, all such decompositions must be\ncompleted before the shaping engine begins step three, below.\n\n:::{figure-md}\n![Two-part matra decomposition](/images/telugu/telugu-matra-decompose.svg \"Two-part matra decomposition\"){.shaping-demo .inline-svg .greyscale-svg #telugu-matra-decompose}\n\nTwo-part matra decomposition\n:::\n\n```{svg-color-toggle-button} telugu-matra-decompose\n```\n\n\n#### Stage 2, step 3: Tag matras ####\n\nThird, all dependent-vowel (matra) signs, including those that\nresulted from the preceding decomposition step, must be tagged to be\nmoved to the correct position in the syllable.\n\nLeft-side matras should be tagged with `POS_PREBASE_MATRA`.\n\nAbove-base matras should be tagged with `POS_BEFORE_SUBJOINED`.\n\nRight-side matras should be tagged according to two rules.\n\n  - Matras <samp>\"Sign U\"</samp> (`U+0C41`) and <samp>\"Sign Uu\"</samp> (`U+0C42`) should be\n       tagged with `POS_BEFORE_SUBJOINED`.\n\n  - Matras <samp>\"Sign Vocalic R\"</samp> (`U+0C43`) and <samp>\"Sign Vocalic Rr\"</samp>\n       (`U+0C44`) should be tagged with `POS_AFTER_SUBJOINED`.\n\nBelow-base matras should be tagged with `POS_BEFORE_SUBJOINED`.\n\nFor simplicity, shaping engines may choose to tag single-part matras\nin an earlier text-processing step, using the information in the\n_Mark-placement subclass_ column of the character tables. It is\ncritical at this step, however, that all decomposed matras are also\ncorrectly tagged before proceeding to the next step.\n\n#### Stage 2, step 4: Adjacent marks ####\n\nFourth, any subsequences of marks that include a <samp>\"Nukta\"</samp> and a\n<samp>\"Halant\"</samp> or Vedic sign must be reordered so that the <samp>\"Nukta\"</samp> appears\nfirst.\n\nThis means that the subsequence <samp>\"Halant,Nukta\"</samp> is reordered to\n<samp>\"Nukta,Halant\"</samp> and that the subsequence <samp>\"_Vedic_sign_,Nukta\"</samp> is\nreordered to <samp>\"Nukta,_Vedic_sign\"</samp>.\n\nFor subsequences of affected marks that are longer than two, the\nreordering operation must be repeated until the <samp>\"Nukta\"</samp> is the first\ncharacter in the subsequence. No other marks in the subsequence\nshould be reordered.\n\nThis order is canonical in Unicode and is required so that\n<samp>\"_consonant_,Nukta\"</samp> substitution rules from <abbr title=\"Glyph Substitution table\">GSUB</abbr> will be correctly\nmatched later in the shaping process.\n\n> Note: Prior to Unicode version 14, the Telugu block did not include\n> a \"Nukta\" mark. However, there are reports of users using the\n> \"Nukta\" from other Indic blocks, so shaping engines may encounter a\n> \"Nukta\" from other scripts in text runs, and should handle the\n> situation gracefully.\n\n#### Stage 2, step 5: Pre-base consonants ####\n\nFifth, consonants that occur before the syllable base must be tagged\nwith `POS_PREBASE_CONSONANT`. Excluding initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\nthat will become <samp>\"Reph\"</samp>s: \n\n  - If the consonant has a below-base form, tag it as\n          `POS_BELOWBASE_CONSONANT`. \n  - Otherwise, tag it as `POS_PREBASE_CONSONANT`.\n  \n> Note: Shaping engines may choose any method to identify consonants that\n> have below-base, post-base, or pre-base-reordering forms while\n> executing the above algorithm. For example, one implementation may\n> choose to maintain a static table of special-form consonants to\n> compare against the text run. Another implementation might examine\n> the active font to see if it includes a `blwf`, `pstf`, or `pref`\n> lookup in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table that affects the consonants encountered in\n> the syllable.\n>\n> However, checking for <abbr title=\"Glyph Substitution table\">GSUB</abbr> support ensures that the expected\n> behavior is implemented in the active font, and is therefore the\n> most reliable approach.\n\nTelugu does not use any pre-base consonants; this step is listed here\nbecause it is part of the general processing scheme for shaping Indic scripts.\n\n> Note: Because Telugu employs the `BLWF_MODE_POST_ONLY` shaping\n> characteristic, consonants with below-base special forms will occur\n> only after the base consonant or syllable base. \n> \n> During the base-consonant search in stage 2, step 1, therefore, all of these below-base\n> form sequences will be encountered and tagged correctly as\n> <samp>\"Halant,_consonant_\"</samp> patterns. The tagging is this step ensures that\n> the <samp>\"_consonant_,Halant\"</samp> pattern preceding the base consonant or syllable base\n> for below-base forms in other Indic scripts will also be tagged correctly.\n\n#### Stage 2, step 6: Reph ####\n\nSixth, initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences that will become <samp>\"Reph\"</samp>s must be tagged with\n`POS_RA_TO_BECOME_REPH`.\n\n> Note: an initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence will always become a <samp>\"Reph\"</samp>.\n\n#### Stage 2, step 7: Final consonants ####\n\nSeventh, all final consonants must be tagged. Consonants that occur\nafter the syllable base _and_ after a dependent vowel (matra) sign\nmust be tagged with  `POS_FINAL_CONSONANT`.\n\n> Note: Final consonants occur only in Sinhala and should not be\n> expected in `<tel2>` text runs. This step is included here to\n> maintain compatibility across Indic scripts.\n\n\n#### Stage 2, step 8: Mark tagging ####\n\nEighth, all marks must be tagged. \n\n> Note: In this step, joiner and non-joiner characters must also be\n> tagged according to the same rules given for marks, even though\n> these characters are not categorized as marks in Unicode.\n\nMarks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,\n`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should\nbe tagged with `POS_SMVD`. \n\nAll <samp>\"Nukta\"</samp>s must be tagged with the same positioning tag as the\npreceding consonant, independent vowel, placeholder, or dotted circle.\n\nAll remaining marks (not in the `POS_SMVD` category and not <samp>\"Nukta\"</samp>s)\nmust be tagged with the same positioning tag as the closest non-mark\ncharacter the mark has affinity with, so that they move together \nduring the sorting step.\n\nThere are two possible cases: those marks before the syllable base\nand those marks after the syllable base. In addition, an exception is\nmade for <samp>\"Halant\"</samp> marks that follow a left-side (pre-base) matra.\n\n  1. Initially, all remaining marks should be tagged with the same\n\t positioning tag as the closest preceding consonant.\n\n  2. For each consonant after the syllable base (such as post-base\n\t consonants, below-base consonants, or final consonants), all\n\t remaining marks located between that current consonant and any\n\t previous consonant should be tagged with the same positioning tag as\n\t the current (later) consonant.\n  \n     In other words, all consonants preceding the syllable base \"own\" the\n\t marks that follow them, while all consonants after the syllable base\n\t \"own\" the marks that come before them. When a syllable does not have\n\t any consonants after the syllable base, the syllable base should\n\t \"own\" all the marks that follow it.\n  \n  3. Finally, <samp>\"Halant\"</samp> marks that follow a left-side dependent vowel\n     (matra) should _not_ be tagged with the left-side matra's\n     positioning tag. Instead, the <samp>\"Halant\"</samp> should be tagged with the\n     positioning tag of the non-mark character preceding the left-side\n     matra. This prevents the <samp>\"Halant\"</samp> mark from being moved with the\n     left-side matra when the syllable is sorted.\n\n\n<!--- HarfBuzz also tags everything between a post-base consonant or -->\n<!--matra and another post-base consonant as belonging to the latter -->\n<!--post-base consonant. --->\n\n\n#### Stage 2, step 9: Sort syllable ####\n\nWith these steps completed, the syllable can be sorted into the final\nsort order as listed at the beginning of stage 2.\n\nThe glyphs in the syllable should be sorted in stable order,\nso that glyphs of the same ordering category remain in the same\nrelative position with respect to each other.\n\n\n#### Stage 2, step 10: Flag sequences for possible feature applications ####\n\nWith the initial reordering complete, those glyphs in the syllable that\nmay have <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features applied in stages 3, 5, and 6 should be\nflagged for each potential feature. \n\nThis flagging is preliminary; the set of potential features varies\nbetween different scripts and which features are supported varies\nbetween fonts. It is also possible that the application of\none feature on a glyph sequence will perform a substitution that makes\na later feature no longer applicable to the updated sequence.\n\nConsequently, the flagging must be completed before shaping proceeds\nto the stages during which features are applied.\n\nSome shaping features, such as `locl`, can potentially apply to any\nglyphs. Therefore it is not necessary to maintain a separate flag for\nthese features in the bitmask (or other data structure) used to track\nthe flags -- although shaping engines may do so if desired.\n\nThe sequences to flag are summarized in the list below; a full\ndescription of each feature's function and interpretation is provided\nin <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> application stages that follow.\n\n  - `nukt` should match <samp>\"_Consonant_,Nukta\"</samp> sequences\n  - `akhn` should match <samp>\"Ka,Halant,Ssa\"</samp>\n  - `rphf` should match initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences\n  - `pref` should match  <samp>\"_Consonant_,Ra\"</samp> sequences in\n            post-base position\n  - `blwf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base positions\n  - `half` should match <samp>\"_Consonant_,Halant\"</samp> in pre-base position but\n           _not_ match <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequences\n  - `pstf` should match <samp>\"Halant,_Consonant_\"</samp> in post-base position\n  - `cjct` should match <samp>\"_Consonant_,Halant,_Consonant_\"</samp> but _not_\n            match <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n            <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp>\n\n\n\n\n### Stage 3: Applying the basic substitution features from <abbr>GSUB</abbr> ###\n\nThe basic-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be flagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2, step 10.\n\nThe order in which these substitutions must be performed is fixed for\nall Indic scripts:\n\n\tlocl\n\tnukt\n\takhn\n\trphf \n\trkrf (not used in Telugu)\n\tpref\n\tblwf \n\tabvf (not used in Telugu)\n\thalf\n\tpstf\n\tvatu (not used in Telugu)\n\tcjct\n\tcfar (not used in Telugu)\n\n#### Stage 3, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n#### Stage 3, step 2: nukt ####\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\n  - The context defined for a `nukt` feature is:\n\n:::{table} `nukt` feature context\n    \n| Backtrack     | Matching sequence             | Lookahead     |\n|:--------------|:------------------------------|:--------------|\n| _none_        | `_consonant_`(full),`_nukta_` | _none_        |\n:::\n\n\n:::{figure-md}\n![Nukta form ligation](/images/telugu/telugu-nukt.svg \"Nukta form ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-nukt}\n\nNukta form ligation\n:::\n\n```{svg-color-toggle-button} telugu-nukt\n```\n\n\n#### Stage 3, step 3: akhn ####\n\nThe `akhn` feature replaces specific sequences with required ligatures. \n\n  - <samp>\"Ka,Halant,Ssa\"</samp> is substituted with the <samp>\"KSsa\"</samp> ligature. \n  \nThese sequences can occur anywhere in a syllable. The characters have\northographic status equivalent to full consonants in some languages,\nand fonts may have `cjct` substitution rules designed to match them in\nsubsequences. Therefore, this feature must be applied before all other\nmany-to-one substitutions. \n\n  - The context defined for an `akhn` feature is:\n\n:::{table} `akhn` feature context\n    \n| Backtrack     | Matching sequence           | Lookahead     |\n|:--------------|:----------------------------|:--------------|\n| _none_        | `AKHAND_CONSONANT_SEQUENCE` | _none_        |\n:::\n\n\n:::{figure-md}\n![KSsa ligation](/images/telugu/telugu-akhn-kssa.svg \"KSsa ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-akhn-kssa}\n\nKSsa ligation\n:::\n\n```{svg-color-toggle-button} telugu-akhn-kssa\n```\n\n\n#### Stage 3, step 4: rphf ####\n\nThe `rphf` feature replaces initial <samp>\"Ra,Halant,ZWJ\"</samp> sequences with the\n<samp>\"Reph\"</samp> glyph.\n\t\n\n  - The context defined for a `rphf` feature is:\n\n:::{table} `rphf` feature context\n    \n| Backtrack        | Matching sequence           | Lookahead     |\n|:-----------------|:----------------------------|:--------------|\n| `SYLLABLE_START` | \"Ra\"(full),`_halant_`,\"ZWJ\" | _none_        |\n:::\n\n\n:::{figure-md}\n![Reph formation](/images/telugu/telugu-rphf.svg \"Reph formation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-rphf}\n\nReph formation\n:::\n\n```{svg-color-toggle-button} telugu-rphf\n```\n\n\n\n#### Stage 3, step 5: rkrf ####\n\n> This feature is not used in Telugu.\n\n#### Stage 3, step 6: pref ####\n\nThe `pref` feature replaces pre-base-reordering consonant glyphs with\nany special forms. Telugu includes one such reordering consonant,\n<samp>\"Ra\"</samp> when it occurs in post-base position.\n\nThe substitution of the nominal glyph for its special form takes place\nat this stage. However, the actual reordering move is performed later,\nin stage 4, step 4.\n\n#### Stage 3, step 7: blwf ####\n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms. All consonants in Telugu can take on a below-base consonant\nform.\n\n:::{figure-md}\n![Below-base form composition](/images/telugu/telugu-blwf.svg \"Below-base form composition\"){.shaping-demo .inline-svg .greyscale-svg #telugu-blwf}\n\nBelow-base form composition\n:::\n\n```{svg-color-toggle-button} telugu-blwf\n```\n\n\n#### Stage 3, step 8: abvf ####\n\n> This feature is not used in Telugu.\n\n#### Stage 3, step 9: half ####\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant or syllable base with \"half forms\" of the consonant\nglyphs.\n\nIn the most common case, this substitution applies to\n<samp>\"_Consonant_,Halant\"</samp> sequences that are followed by another\n_Consonant_.\n\nIn addition, a sequence matching <samp>\"_Consonant_,Halant,ZWJ\"</samp> must also be\nflagged for potential `half` substitutions.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> at the end of the sequence means\n> that the sequence may match the regular-expression test in stage 1\n> as the end of a syllable, even without being followed by a base\n> consonant or syllable base.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n\nThere are two exceptions to the default behavior, for which the\nshaping engine must test:\n\n  - Initial <samp>\"Ra,Halant\"</samp> sequences, which should have been flagged for\n    the `rphf` feature earlier, must not be flagged for potential\n    `half` substitutions.\n\n  - A sequence matching <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be\n    flagged for potential `half` substitutions.\n\n> Note: Telugu does not usually incorporate half forms, but it is\n> possible for a font to implement them in order to provide for\n> desired typographic variation.\n\n:::{figure-md}\n![Half form composition](/images/telugu/telugu-half.svg \"Half form composition\"){.shaping-demo .inline-svg .greyscale-svg #telugu-half}\n\nHalf form composition\n:::\n\n```{svg-color-toggle-button} telugu-half\n```\n\n\n#### Stage 3, step 10: pstf ####\n\nThe `pstf` feature replaces post-base-consonant glyphs with any special forms.\n\n\n#### Stage 3, step 11: vatu ####\n\n> This feature is not used in Telugu.\n\n#### Stage 3, step 12: cjct ####\n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\nA sequence matching <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> or\n<samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> must not be flagged to form a conjunct.\n\n> Note: The presence of the <samp>\"ZWJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> sequence should automatically\n> inhibit any `cjct` feature rules from matching the sequence as valid\n> input, and thus prevent the `cjct` substitution from being applied.\n\n> Note: The presence of the <samp>\"ZWNJ\"</samp> in a\n> <samp>\"_Consonant_,Halant,ZWNJ,_Consonant_\"</samp> sequence means that the\n> <samp>\"_Consonant_,Halant,ZWNJ\"</samp> subsequence will match the\n> regular-expression test in stage 1 as the end of a syllable.\n> \n> Because OpenType shaping features in `<tel2>` are defined as\n> applying only within an individual syllable, this means that the\n> presence of the <samp>\"ZWNJ\"</samp> will automatically prevent the application of\n> a `cjct` feature by triggering the identification of a syllable\n> break between the two consonants.\n>\n> The fact that the regular-expression tests identify a syllable break\n> after the <samp>\"_Consonant_,Halant,ZWNJ\"</samp> is a byproduct of OpenType\n> shaping and Unicode encoding, however, and might not have any\n> significance with regard to the definition of syllables used in the\n> language or orthography of the text.\n>\n> Note, also: The presence of the <samp>\"ZWJ\"</samp> means that a\n> <samp>\"_Consonant_,Halant,ZWJ\"</samp> sequence may match the regular-expression\n> test in stage 1 as the end of a syllable, even without being\n> followed by a base consonant or syllable base. By definition,\n> however, a <samp>\"_Consonant_,Halant,ZWJ\"</samp> syllable identified in stage 1\n> cannot also include a <samp>\"_Consonant_\"</samp> after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n\nThe font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> rules might be implemented so that `cjct`\nsubstitutions apply to half-form consonants; therefore, this feature\nmust be applied after the `half` feature. \n\n> Note: Telugu does not usually incorporate conjuncts, but it is\n> possible for a font to implement the `cjct` feature in order to\n> provide for desired typographic variation.\n\n\n#### Stage 3, step 13: cfar ####\n\n> This feature is not used in Telugu.\n\n\n### Stage 4: Final reordering ###\n\nThe final reordering stage repositions marks, dependent-vowel (matra)\nsigns, and <samp>\"Reph\"</samp> glyphs to the appropriate location with respect to\nthe base consonant or syllable base. Because multiple substitutions\nmay have occurred during the application of the basic-shaping features\nin the preceding stage, these repositioning moves could not be\nperformed during the initial reordering stage.\n\nLike the initial reordering stage, the steps involved in this stage\noccur on a per-syllable basis.\n\n<!--- Check that classifications have not been mangled. If the -->\n<!--character is a Halant AND a ligature was formed AND a multiple\nsubstitution was performed, restore the classification to VIRAMA\nbecause it was almost certainly lost in the preceding <abbr title=\"Glyph Substitution table\">GSUB</abbr> stage.\n--->\n\n#### Stage 4, step 1: Base consonant ####\n\nThe final reordering stage, like the initial reordering stage, begins\nwith determining the syllable base of each syllable, following the\nsame algorithm used in stage 2, step 1.\n\nIn a syllable that begins with an independent vowel, the independent\nvowel will always serve as the syllable base. In a standalone sequence or\nother syllable that begins with a placeholder or a dotted circle, the\nplaceholder or dotted circle will always serve as the syllable base.\n\nIn a syllable that begins with a consonant, the shaping engine must\nrepeat the base-consonant search algorithm used in stage 2, step 1.\n\nThe codepoint of the underlying base consonant or syllable base will\nnot change between the search performed in stage 2, step 1, and the\nsearch repeated here. However, the application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> shaping\nfeatures in stage 3 means that several ligation and many-to-one\nsubstitutions may have taken place. The final glyph produced by that\nprocess may, therefore, be a conjunct or ligature form — in most\ncases, such a glyph will not have an assigned Unicode codepoint.\n   \n#### Stage 4, step 2: Pre-base matras ####\n\nPre-base dependent vowels (matras) that were reordered during the\ninitial reordering stage must be moved to their final position. This\nposition is defined as:\n   \n   - after the last standalone <samp>\"Halant\"</samp> glyph that comes after the\n     matra's starting position and also comes before the main\n     consonant.\n   - If a zero-width joiner follows this last standalone <samp>\"Halant\"</samp>, the\n     final matra position is moved to after the joiner.\n\nThis means that the matra will move to the right of all explicit\n<samp>\"consonant,Halant\"</samp> subsequences, but will stop to the left of the base\nconsonant or syllable base, all conjuncts or ligatures that contain\nthe base consonant or syllable base, and all half forms.\n\n> Note: OpenType and Unicode both state that if the syllable includes\n> a <abbr title=\"Zero-Width Joiner\">ZWJ</abbr> immediately after the last <samp>\"Halant\"</samp>, then the final matra\n> position should be after the <abbr title=\"Zero-Width Joiner\">ZWJ</abbr>.\n>\n> However, there are several test sequences indicating that\n> Microsoft's Uniscribe shaping engine did not follow this rule (in,\n> at least, Devanagari and Bengali text), and in these circumstances\n> Uniscribe instead makes the final matra position before the final\n> <samp>\"Consonant,Halant,ZWJ\"</samp>.\n>\n> Subsequently, the HarfBuzz shaping engine has also followed the same\n> pattern. If other shaping engine implementations prefer to maintain\n> maximum compatibility with Uniscribe and HarfBuzz, then they should\n> also follow suit.\n\n> Note: The Microsoft script-development specifications for OpenType\n> shaping also state that if a zero-width non-joiner follows the last\n> standalone <samp>\"Halant\"</samp>, the final matra position is moved to after the\n> non-joiner. However, it is unnecessary to test for this condition,\n> because a <samp>\"Halant,ZWNJ\"</samp> subsequence is, by definition, the end of a\n> syllable. Consequently, a <samp>\"Halant,ZWNJ\"</samp> cannot be followed by a\n> pre-base dependent vowel.\n\n\n\n#### Stage 4, step 3: Reph ####\n\n<samp>\"Reph\"</samp> must be moved from the beginning of the syllable to its final\nposition. Because Telugu incorporates the `REPH_POS_AFTER_POST`\nshaping characteristic, this final position is immediately after \nany post-base consonant forms.\n\n\nThe algorithm for finding the final <samp>\"Reph\"</samp> position is\n\n  - Move the <samp>\"Reph\"</samp> to the position immediately before\n    the first post-base matra, syllable modifier, or Vedic sign that\n    has a positioning tag after the script's <samp>\"Reph\"</samp> position in the\n    syllable sort order (as listed in [stage\n    2](#stage-2-initial-reordering)). This will be the final <samp>\"Reph\"</samp>\n    position. \n\t> Note: Because Telugu incorporates the\n    > `REPH_POS_AFTER_POST` shaping characteristic, this means\n    > any positioning tag of `POS_FINAL_CONSONANT` or later,\n    > although a post-base matra, syllable modifier, or Vedic sign\n    > would not typically be tagged with `POS_FINAL_CONSONANT`.\n  - If no other location has been located in the previous step, move\n    the <samp>\"Reph\"</samp> to the end of the syllable.\n\n\nFinally, if the final position of <samp>\"Reph\"</samp> occurs after a\n<samp>\"_matra_,Halant\"</samp> subsequence, then <samp>\"Reph\"</samp> must be repositioned to the\nleft of <samp>\"Halant\"</samp>, to allow for potential matching with `abvs` or\n`psts` substitutions from <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\n#### Stage 4, step 4: Pre-base-reordering consonants ####\n\nAny pre-base-reordering consonants must be moved to before\nthe base consonant or syllable base.\n  \nTelugu includes one such reordering consonant. <samp>\"Ra\"</samp> occurring in the\npost-base position is reordered to a pre-base position at this step.\n\nThe algorithm for reordering <samp>\"Ra\"</samp> in this circumstance is:\n\n  - Only reorder the <samp>\"Ra\"</samp> if the current glyph was substituted using\n    the `pref` feature in stage 3, step 6.\n  - Select the final position using [the same method](#stage-4-step-2-pre-base-matras) as used for\n    reordering a pre-base matra.\n  - If the pre-base matra positioning algorithm cannot determine the final\n    position, place the <samp>\"Ra\"</samp> immediately before the base consonant or syllable base.\n\n\n#### Stage 4, step 5: Initial matras ####\n\nAny left-side dependent vowels (matras) that are at the start of a\nword must be flagged for potential substitution by the `init` feature\nof <abbr title=\"Glyph Substitution table\">GSUB</abbr>.\n\nTelugu does not use the `init` feature, so this step will\ninvolve no work when processing `<tel2>` text. It is included here in\norder to maintain compatibility with the other Indic scripts.\n\n\n### Stage 5: Applying all remaining substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the remaining substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. In preparation for this stage, glyph sequences should be\nflagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features in stage 2,\nstep 10.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tinit (not used in Telugu)\n\tpres\n\tabvs\n\tblws\n\tpsts\n\thaln\n\nThe `init` feature is not used in Telugu.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Above-base form ligation](/images/telugu/telugu-abvs.svg \"Above-base form ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-abvs}\n\nAbove-base form ligation\n:::\n\n```{svg-color-toggle-button} telugu-abvs\n```\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually involves replacing multiple\nbelow-base glyphs (substituted earlier with the `blwf`) feature with\nligatures or conjunct forms.\n\n:::{figure-md}\n![Below-base form ligation](/images/telugu/telugu-blws.svg \"Below-base form ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-blws}\n\nBelow-base form ligation\n:::\n\n```{svg-color-toggle-button} telugu-blws\n```\n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\n:::{figure-md}\n![Post-base form ligation](/images/telugu/telugu-psts.svg \"Post-base form ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-psts}\n\nPost-base form ligation\n:::\n\n```{svg-color-toggle-button} telugu-psts\n```\n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\n:::{figure-md}\n![Halant form ligation](/images/telugu/telugu-haln.svg \"Halant form ligation\"){.shaping-demo .inline-svg .greyscale-svg #telugu-haln}\n\nHalant form ligation\n:::\n\n```{svg-color-toggle-button} telugu-haln\n```\n\n> Note: The `calt` feature, which allows for generalized application\n> of contextual alternate substitutions, is usually applied at this\n> point. However, `calt` is not mandatory for correct Telugu shaping\n> and may be disabled in the application by user preference.\n\n\n\n### Stage 6: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied.\n\nAs with the preceding stage, the order in which these features are\napplied is not canonical; they should be applied in the order in which\nthey appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        dist\n        abvm\n        blwm\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` (like `calt`, above) is not\n> mandatory for shaping Telugu text and may be disabled by user preference.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\n:::{figure-md}\n![Distance positioning](/images/telugu/telugu-dist.svg \"Distance positioning\"){.shaping-demo .inline-svg .greyscale-svg #telugu-dist}\n\nDistance positioning\n:::\n\n```{svg-color-toggle-button} telugu-dist\n```\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. In Telugu, this includes above-base dependent vowels (matras),\ndiacritical marks, and Vedic signs. \n\n:::{figure-md}\n![Above-base mark positioning](/images/telugu/telugu-abvm.svg \"Above-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #telugu-abvm}\n\nAbove-base mark positioning\n:::\n\n```{svg-color-toggle-button} telugu-abvm\n```\n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. In Telugu, this includes below-base dependent vowels\n(matras) as well as below-base diacritical marks.\n\n:::{figure-md}\n![Below-base mark positioning](/images/telugu/telugu-blwm.svg \"Below-base mark positioning\"){.shaping-demo .inline-svg .greyscale-svg #telugu-blwm}\n\nBelow-base mark positioning\n:::\n\n```{svg-color-toggle-button} telugu-blwm\n```\n\n\n## The `<telu>` shaping model ##\n\nThe older Telugu script tag, `<telu>`, has been deprecated. However,\nshaping engines may still encounter fonts that were built to work with\n`<telu>` and some users may still have documents that were written to\ntake advantage of `<telu>` shaping.\n\n### Distinctions from `<tel2>` ###\n\nThe most significant distinction between the shaping models is that the\nsequence of <samp>\"Halant\"</samp> and consonant glyphs used to trigger shaping\nfeatures was altered when migrating from `<telu>` to\n`<tel2>`. \n\nSpecifically, shaping engines were expected to reorder post-base\n<samp>\"Halant,_Consonant_\"</samp> sequences to <samp>\"_Consonant_,Halant\"</samp>.\n\nAs a result, a font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions would be written to match\n<samp>\"_Consonant_,Halant\"</samp> sequences in all pre-base and post-base positions.\n\n\nThe `<telu>` syllable\n\n\tPre-baseC Halant BaseC Halant Post-baseC\n\nwould be reordered to\n\n\tPre-baseC Halant BaseC Post-baseC Halant\n\nbefore features are applied.\n\nIn `<tel2>` text, as described above in this document, there is no\nsuch reordering. The correct sequence to match for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions is\n<samp>\"_Consonant_,Halant\"</samp> for pre-base consonants, but <samp>\"Halant,_Consonant_\"</samp>\nfor post-base consonants.\n\nIn addition, for some scripts, left-side dependent vowel marks\n(matras) were not repositioned during the final reordering\nstage. For `<telu>` text, the left-side matra was always positioned\nat the beginning of the syllable.\n\n\n### Advice for handling fonts with `<telu>` features only ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences in order to apply <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions when it is known that\nthe font in use supports only the `<telu>` shaping model.\n\n### Advice for handling text runs composed in `<telu>` format ###\n\nShaping engines may choose to match post-base <samp>\"_Consonant_,Halant\"</samp>\nsequences for <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions or to reorder them to\n<samp>\"Halant,_Consonant_\"</samp> when processing text runs that are tagged with\nthe `<telu>` script tag and it is known that the font in use supports\nonly the `<tel2>` shaping model.\n\nShaping engines may also choose to position left-side matras according\nto the `<telu>` ordering scheme; however, doing so might interfere\nwith matching <abbr title=\"Glyph Substitution table\">GSUB</abbr> or <abbr title=\"Glyph Positioning table\">GPOS</abbr> features.\n"
  },
  {
    "path": "opentype-shaping-thai-lao.md",
    "content": "```{include} /_global.md\n```\n\n# Thai and Lao shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Thai and Lao scripts.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Mark combining classes](#mark-combining-classes)\n      - [<abbr>PUA</abbr> fallback classifications](#pua-fallback-classifications)\n      - [Thai and Lao character tables](#thai-and-lao-character-tables)\n  - [The `<thai>`/`<lao >` shaping model](#the-thailao-shaping-model)\n      - [Stage 1: Applying the language substitution features from <abbr>GSUB</abbr>](#stage-1-applying-the-language-substitution-features-from-gsub)\n      - [Stage 2: Decomposing all Am vowel signs](#stage-2-decomposing-all-am-vowel-signs)\n      - [Stage 3: Reordering sequences of marks](#stage-3-reordering-sequences-of-marks)\n      - [Stage 4: Applying all positioning features from <abbr>GPOS</abbr>](#stage-4-applying-all-positioning-features-from-gpos)\n  - [The <abbr>PUA</abbr> fallback shaping model](#the-pua-fallback-shaping-model)\n      - [Contextual replacement rules](#contextual-replacement-rules)\n\t    - [Stage 1: Decomposing all Am vowel signs](#stage-1-decomposing-all-am-vowel-signs)\n      - [Stage 2: Reordering sequences of marks](#stage-2-reordering-sequences-of-marks)\n      - [Stage 3: Remapping codepoints to the appropriate <abbr>PUA</abbr> alternates](#stage-3-remapping-codepoints-to-the-appropriate-pua-alternates)\n\n\n\n## General information ##\n\nThe Thai and Lao scripts are both descendants of the Brahmi script,\nand follow many of the same general patterns found in [Indic\nscripts](opentype-shaping-indic-general.md). They are distinct enough \nfrom Indic scripts that they should not be supported by a\ngeneral-purpose Indic shaping engine.\n\nThai and Lao use different alphabets but are historically\nrelated. They share common orthographic conventions and shaping\ncharacteristics, which enables shaping engines to support both scripts\nin a single implementation.\n\nThe Thai script is used to write multiple languages, most commonly\nThai, Pak Thai (or Southern Thai), Kuy, Isan, Lanna (or Northern\nThai), and Kelantan-Pattani Malay. In addition, the Thai script is\nused to write Sanskrit and Pali. However, the Thai script is not used\nfor Vedic texts, therefore Thai and Lao text runs are not expected to\ninclude any glyphs from the Vedic Extensions block of Unicode.\n\nThe Lao script is used to write multiple languages, most commonly\nLao, Khmu', Hmong, and Isan. \n\nThe Thai script tag defined in OpenType is `<thai>`. The Lao script\ntag defined in OpenType is `<lao >`. Because OpenType script tags must\nbe exactly four letters long, the `<lao >` tag includes a trailing\nspace. \n\nA significant number of  older Thai fonts that do not use the OpenType\nshaping model are still in usage; these fonts employ the Unicode\n\"Private Use Area\" (<abbr>PUA</abbr>) to store contextual forms of\ncharacters. Shaping engines may implement this <abbr title=\"Private Use Area\">PUA</abbr>-base shaping model\nas a fallback mechanism when such fonts are encountered.\n\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Brahmi-derived and\nIndic scripts.  The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\nBoth Thai and Lao feature inherent vowels for every consonant, and\nemploy **dependent vowel** signs to replace the inherent vowel with a\ndifferent vowel sound.\n\nThe Thai term for a dependent vowel sign is **sara**. The Lao term for\na vowel sign is **sala**. The official names of the Thai vowel signs\nin the Unicode standard includes \"sara\" (for example, <samp>\"Sara Am\"</samp>),\nwhile the official names of the Lao vowel signs use \"sign\" (for\nexample, <samp>\"Sign Am\"</samp>).\n\nSome of these dependent-vowel signs are encoded as marks that attach\nto the consonant in **above-base** or **below-base** position. Others\nare encoded as full letters that may appear in **pre-base**\n(left-side) or **post-base** (right-side) position.\n\nThai and Lao differ from Indic scripts in that these pre-base dependent\nvowels are entered before typing the consonant to which they\napply. Therefore, pre-base dependent vowels do not need to be\nreordered by the shaping engine.\n\n**Phinthu** is the term used for the Thai equivalent of the \"halant\"\nor \"virama\" mark that suppresses the inherent vowel of a consonant. It\nis used only when writing Sanskrit or Pali text in the Thai script.\n\n**Nikhahit** is the term for the Thai equivalent of \"anusvara\". It\nis used only when writing Sanskrit or Pali text in the Thai\nscript. The equivalent mark in Lao is called **niggahita**.\n\nBoth Thai and Lao include several **tone markers** as combining marks\nthat are positioned with respect to the consonant and, possibly, to\nany corresponding dependent-vowel marks.\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n\n## Glyph classification ##\n\nShaping Thai and Lao text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not always sufficient to fully capture the\nexpected shaping behavior. Therefore, Thai and Lao glyphs may\nadditionally be classified by how they are treated when shaping a run\nof text.\n\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Thai and Lao scripts. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks, including diacritics, tone markers, and dependent vowels, are further labeled\nwith a mark-placement subclass, which indicates where the glyph will\nbe placed with respect to the base character to which it is\nattached. The actual position of the glyphs is determined by the\nlookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr> table.\n\nThere are three basic _mark-placement subclasses_ for marks\nin Thai and Lao. Each corresponds to the visual position of the mark with\nrespect to the consonant to which it is attached:\n\n  - `TOP_POSITION` marks are positioned above the consonant.\n  - `BOTTOM_POSITION` marks are positioned below the consonant.\n  - `RIGHT_POSITION` marks are positioned to the right of the consonant.\n  \nThai and Lao vowel marks can also appear to the left of the consonant\nto which they are attached. However, in Thai and Lao text runs, these\nvowels exist _before_ the consonant — that is, to the left of the\nconsonant in the character sequence. Thus, no reordering of these\nvowels (as is done in several other Brahmi-derived scripts) is\nrequired for Thai or Lao.\n\nIn order to unambiguously distinguish between this non-reordering\nconvention and the reordering conventions of other scripts, the\nleft-side vowels are not designated `LEFT_POSITION` in their\nmark-placement subclass. Instead, these vowels are classified as `VISUAL_ORDER_LEFT`.\n\nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Above-base_ \n  - _Below-base_ \n  - _Pre-base_ \n  - _Post-base_ \n  \nrespectively. The `VISUAL_ORDER_LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there may be some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n\n### Mark combining classes ###\n\nThe Unicode standard defines a _canonical combining class_ for each mark\ncodepoint that is used whenever a sequence of marks needs to be sorted\ninto canonical order. \n\nThe numeric values of these combining classes are used during Unicode\nnormalization. \n\nAll Thai and Lao marks belong to standard combining classes. However,\nfor script-shaping purposes, some marks need to be reassigned to a\nmodified class in order to ensure that certain sequences of\nconsecutive marks are reordered correctly.\n\nIn particular, the Thai <samp>\"Sara U\"</samp> (`U+0E38`) and <samp>\"Sara Uu\"</samp> (`U+0E39`)\nmarks are reassigned from the canonical class 103 to the class 3\n(which is an unused class in Unicode's set of canonical classes).\n\nThis ensures that <samp>\"Sara U\"</samp> or <samp>\"Sara Uu\"</samp> codepoints adjacent to\n<samp>\"Phinthu\"</samp> (`U+0E3A`) are not reordered to a position after the\n<samp>\"Phinthu\"</samp> mark.\n\n\n:::{table} Mark-classification table\n\n| Codepoint | Combining class | Glyph                              |\n|:----------|:----------------|:-----------------------------------|\n|`U+0E38`   | 3               | &#x0E38; Sara U                    |\n|`U+0E47`   | _0_             | &#x0E47; Maitaikhu                 |\n|`U+0E4A`   | 107             | &#x0E4A; Mai Tri                   |\n|`U+0EB9`   | 118             | &#x0EB9; Sign Uu                   |\n|`U+0EBC`   | _0_             | &#x0E47; Semivowel Sign Lo         |\n|`U+0ECB`   | 122             | &#x0E4A; Tone Mai Catawa           |\n:::\n\n> Note: Reassigning marks to modified classes in this manner should\n> not produce any unwanted side effects, because the reassigned class\n> is unused. However, any implementations that need to maintain strict\n> adherence to Unicode's canonical combining classes may choose to\n> handle the Phinthu-reordering issue in a different manner.\n\n\n### <abbr>PUA</abbr> fallback classifications ###\n\nOlder Thai fonts that implement the <abbr title=\"Private Use Area\">PUA</abbr>-substitution fallback method\nrather than modern OpenType script shaping rules incorporate\nsubclasses for consonants that indicate whether or not the consonant\nincludes an ascender, a normal descender, or a removable descender.\n\nThere are four possible values:\n\n  - `NORMAL_CONSONANT` or `NC`\n  - `ASCENDER_CONSONANT` or `AC`\n  - `DESCENDER_CONSONANT` or `DC`\n  - `REMOVABLE_DESCENDER_CONSONANT` or `RC`\n  \nFurthermore, vowels and marks in these fonts are classified by whether\nthey are positioned at the same baseline as consonants, below\nconsonants, above consonants, or must be positioned at the top of any\nstacks of marks.\n\nThere are four possible values:\n\n  - `CONSONANT_BASELINE_LEVEL` or `CV`\n  - `BELOW_CONSONANT_LEVEL` or `BV`\n  - `ABOVE_CONSONANT_LEVEL` or `AV`\n  - `TOP_LEVEL` or `TV`\n\n\n### Thai and Lao character tables ###\n\nSeparate character tables are provided for the Thai and Lao blocks as\nwell as for other miscellaneous characters that are used in `<thai>`\nand `<lao >` text runs: \n\n  - [Thai character table](character-tables/character-tables-thai.md#thai-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-thai.md#miscellaneous-character-table)\n\n  - [Lao character table](character-tables/character-tables-lao.md#lao-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-lao.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, its mark-placement subclass, and its\n<abbr title=\"Private Use Area\">PUA</abbr>-fallback category. The codepoint's Unicode name and an example\nglyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass | PUA    | Glyph                         |\n|:----------|:-----------------|:------------------|:------------------------|:-------|:------------------------------|\n|`U+0E01`   | Letter           | CONSONANT         | _null_                  | NC     | &#x0E01; Ko Kai               |\n| | | | | | |\n|`U+0E48`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | TV     | &#x0E48; Mai Ek               |\n| | | | | | |\n|`U+0E81`   | Letter           | CONSONANT         | _null_                  | _null_ | &#x0E81; Ko                   |\n| | | | | | |\n|`U+0EC8`   | Mark [Mn]        | TONE_MARKER       | TOP_POSITION            | _null_ | &#x0EC8; Tone Mai Ek          |\n:::\n\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nThe _PUA_ column indicates which, if any, fallback-shaping category\nthe codepoint belongs to when found in older fonts using the <abbr title=\"Private Use Area\">PUA</abbr>\nfallback shaping scheme. Note that the <abbr title=\"Private Use Area\">PUA</abbr> method was employed only\nfor Thai fonts, so Lao codepoints do not have a <abbr title=\"Private Use Area\">PUA</abbr> fallback-shaping\ncategory. Thai codepoints with a _null_ in the _PUA_ column were not\nused in the <abbr title=\"Private Use Area\">PUA</abbr> fallback-shaping scheme and evoke no special behavior\nfrom the shaping engine.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\nOther important characters that may be encountered when shaping runs\nof Thai and Lao text include the dotted-circle placeholder (`U+25CC`), \nthe no-break space (`U+00A0`), and the zero-width space (`U+200B`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel sign or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n<!--- The zero-width joiner is primarily used to prevent the formation of a\nsubjoining form from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. The sequence\n<samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the substitution of a\nsubjoined form for the second consonant. --->\n\n<!---\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\nwhere an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width joiner\notherwise would.\n--->\n\nThe no-break space is primarily used to insert spaces between\nphrases. Thai and Lao texts do not employ inter-word spaces. Consequently,\nwhen spaces are inserted into a text run, it is important that they be\npreserved: line-breaking algorithms must not break lines after a\nThai or Lao space, so the no-break space character is used instead of the\ntraditional space. \n\nThe no-break space may also be used to display those codepoints that\nare defined as non-spacing (marks, dependent vowels (matras),\nbelow-base consonant forms, and post-base consonant forms) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder. \n\n## The `<thai>`/`<lao >` shaping model ##\n\nProcessing a run of `<thai>` or `<lao >` text involves four top-level stages:\n\n\n1. Applying the language substitution features from <abbr>GSUB</abbr>\n2. Decomposing all Am vowel signs\n3. Reordering sequences of marks\n4. Applying all positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Brahmi-derived and Indic scripts, the basic substitution\nfeatures must be applied to the run in a specific order. The\npositioning features in the final stage, however, do not have a\nmandatory order.\n\nUnlike many other Brahmi-derived and Indic scripts, shaping Thai and Lao\ntext does not require a syllable-identification stage.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n\n### Stage 1: Applying the language substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed:\n\n\tlocl\n\tccmp\n\n\n#### Stage 1, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n#### Stage 1, step 2: ccmp ####\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of glyphs. \n \nIn `<thai>` and `<lao >` text, this may include a decomposition for\nthe <samp>\"Am\"</samp> dependent-vowel sign. If such a decomposition is used in the\nactive font, the shaping engine must keep track of the fact that the\nresulting components originated as an <samp>\"Am\"</samp> sign. \n\nIf there is not an <samp>\"Am\"</samp> decomposition in the active font's `ccmp`\nlookup, the shaping engine will decompose the codepoint in the\nfollowing stage.\n  \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n:::{figure-md}\n![Glyph composition](images/thai-lao/thai-ccmp.svg \"Glyph composition\"){.shaping-demo .inline-svg .greyscale-svg #thai-ccmp}\n\nGlyph composition\n:::\n\n```{svg-color-toggle-button} thai-ccmp\n```\n\n### Stage 2: Decomposing all Am vowel signs ###\n\nThe Thai and Lao alphabets each include one character that must be\ndecomposed for shaping purposes, the vowel sign <samp>\"Am\"</samp>. The decomposition is\ncanonically defined, resulting in the sequence <samp>\"_Anusvara_,Sara Aa\"</samp> in\nthe appropriate script. \n\n  - Thai Sara Am (`U+0E33`) decomposes to <samp>\"Nikhahit,Sara Aa\"</samp> (`U+0E4D`,`U+0E32`).\n  - Lao Sign Am (`U+0EB3`) decomposes to <samp>\"Niggahita,Sign Aa\"</samp> (`U+0ECD`,`U+0EB2`).\n\n> Note: if the active font decomposed the <samp>\"Am\"</samp> sign via a `ccmp`\n> feature lookup during stage one, then no further action is needed\n> on the shaping engine's part during this stage.\n\nThe shaping engine must keep track of the fact that the <samp>\"Nikhahit\"</samp> or\n<samp>\"Niggahita\"</samp> marks originated as part of an <samp>\"Am\"</samp> sign, because these\ndecomposed marks are handled differently during the mark-reordering\nstage.\n\n:::{figure-md}\n![Am decomposition](images/thai-lao/lao-am-decomposition.svg \"Am decomposition\"){.shaping-demo .inline-svg .greyscale-svg #lao-am-decomposition}\n\nAm decomposition\n:::\n\n```{svg-color-toggle-button} lao-am-decomposition\n```\n  \n### Stage 3: Reordering sequences of marks ###\n\nIn this stage, sequences of consecutive marks may need to be\nreordered.\n\nIn `<thai>` and `<lao >` text runs, two conditions should be checked\nfor possible reordering.\n\n  - A <samp>\"Nikhahit\"</samp> or <samp>\"Niggahita\"</samp> mark that originated as part of an\n    <samp>\"Am\"</samp> sign (which was decomposed in stage two, above) must be\n    reordered so that it occurs before any tone markers in the\n    sequence of marks.\n  - A <samp>\"Phinthu\"</samp> mark must be reordered so that it occurs after any\n    <samp>\"Sara U\"</samp> or <samp>\"Sara Uu\"</samp> marks.\n\t\n> Note: <samp>\"Nikhahit\"</samp> or <samp>\"Niggahita\"</samp> marks that were not originally part\n> of an <samp>\"Am\"</samp> sign should not be reordered.\n\n> Note: Shaping engines may alternatively choose to implement the Phinthu\n> reordering rule by modifying the combining classes assigned to\n> <samp>\"Phinthu\"</samp>, <samp>\"Sara U\"</samp>, and <samp>\"Sara Uu\"</samp> as necessary before processing\n> the text run, or by performing a sorting step at this stage.\n\n\n<!--- \n\nmove the\n   * NIKHAHIT backwards over any tone mark (0E48-0E4B).\n   *\n   * <0E14, 0E4B, 0E33> -> <0E14, 0E4D, 0E4B, 0E32>\n   *\n   * This reordering is legit only when the NIKHAHIT comes from a SARA AM, not\n   * when it's there to start with. The string <0E14, 0E4B, 0E4D> is probably\n   * not what a user wanted, but the rendering is nevertheless nikhahit above\n   * chattawa.\n   *\n   * Same for Lao.\n   *\n   * Note:\n   *\n   * Uniscribe also does some below-marks reordering.  Namely, it positions U+0E3A\n   * after U+0E38 and U+0E39.  We do that by modifying the ccc for U+0E3A.\n   * See unicode->modified_combining_class ().  Lao does NOT have a U+0E3A\n   * equivalent.\n\n--->\n\n\n### Stage 4: Applying all positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied. As with the preceding stage, the order in which these\nfeatures are applied is not canonical; they should be applied in the\norder in which they appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n\tkern\n\tmark\n\tmkmk\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` is not mandatory for shaping\n> Thai and Lao text and may be disabled by user preference.\n\nThe `kern` feature adjusts the horizontal positioning of\nglyphs.\n\n:::{figure-md}\n![Application of the kern feature](/images/thai-lao/lao-kern.svg \"Application of the kern feature\"){.shaping-demo .inline-svg .greyscale-svg #lao-kern}\n\nApplication of the kern feature\n:::\n\n```{svg-color-toggle-button} lao-kern\n```\n\nThe `mark` feature positions marks with respect to base glyphs.\n\n:::{figure-md}\n![Application of the mark feature](/images/thai-lao/thai-mark.svg \"Application of the mark feature\"){.shaping-demo .inline-svg .greyscale-svg #thai-mark}\n\nApplication of the mark feature\n:::\n\n```{svg-color-toggle-button} thai-mark\n```\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n\n:::{figure-md}\n![Application of the mkmk feature](/images/thai-lao/thai-mkmk.svg \"Application of the mkmk feature\"){.shaping-demo .inline-svg .greyscale-svg #thai-mkmk}\n\nApplication of the mkmk feature\n:::\n\n```{svg-color-toggle-button} thai-mkmk\n```\n\n\n## The <abbr>PUA</abbr> fallback shaping model ##\n\nA significant number of  older Thai fonts that do not use the OpenType\nshaping model are still in usage; these fonts employ the Unicode\n\"Private Use Area\" (<abbr>PUA</abbr>) to store contextual forms of\ncharacters.\n\nThe <abbr title=\"Private Use Area\">PUA</abbr> shaping model is described at\n[linux.thai.net/~thep/th-otf/shaping.html](https://linux.thai.net/~thep/th-otf/shaping.html)\n. It relies on a set of pre-determined mappings from the codepoints in the\nUnicode Thai block to codepoints in the <abbr title=\"Private Use Area\">PUA</abbr>.\n\nFor consonants, these alternate-glyph mappings depend on whether or\nnot the consonant includes an ascender, a normal descender, or a\nremovable descender.\n\nThere are four possible values:\n\n  - `NORMAL_CONSONANT` or `NC`\n  - `ASCENDER_CONSONANT` or `AC`\n  - `DESCENDER_CONSONANT` or `DC`\n  - `REMOVABLE_DESCENDER_CONSONANT` or `RC`\n  \nFurthermore, vowels and marks in these fonts are classified by whether\nthey are positioned at the same baseline as consonants, below\nconsonants, above consonants, or must be positioned at the top of any\nstacks of marks.\n\nThere are four possible values:\n\n  - `CONSONANT_BASELINE_LEVEL` or `CV`\n  - `BELOW_CONSONANT_LEVEL` or `BV`\n  - `ABOVE_CONSONANT_LEVEL` or `AV`\n  - `TOP_LEVEL` or `TV`\n\n\nThe classifications of the consonant, vowel, and mark characters in\nthe Thai Block are listed in the _PUA_ column of the [Thai character\ntable](character-tables/character-tables-thai.md#thai-character-table). \n\n\n## Contextual replacement rules ##\n\nCodepoints in the Thai Block can be mapped to one of several alternate\n<abbr title=\"Private Use Area\">PUA</abbr> codepoints depending on context:\n\n  - A tone marker that does not follow an above-base vowel sign may be\n    mapped to an alternate that is positioned lower, closer to the top\n    of the consonant. This is a `SHIFT_DOWN` replacement action.\n  - A tone marker, above-base diacritic, or above-base vowel sign\n    following a consonant with an ascender may be mapped to an\n    alternate that is positioned further to the left (thereby\n    preventing a collision with the ascender). This is a `SHIFT_LEFT` replacement action.\n  - A below-base vowel sign that follows a consonant with a\n    non-removable descender may be mapped to an alternate that is\n    positioned lower (thereby preventing a collision with the\n    descender). This is a `SHIFT_DOWN` replacement action.\n  - A consonant with a removable descender may be mapped to a\n    descender-less alternate when the consonant is followed by a\n    below-base vowel sign. This is a `REMOVE_DESCENDER` replacement action.\n\t\nThe above rules may combine. Specifically, a tone marker that does not\nfollow an above-base vowel sign _and_ follows a consonant with an\nascender must be positioned lower and further to the left.  This is a\n`SHIFT_DOWN_AND_LEFT` replacement action.\n\nAdditionally, below-base vowels are handled separately from above-base\nvowels and tone markers; a consonant that is followed by a below-base\nvowel and a tone marker may have to perform two independent\nreplacement actions.\n\t\nThe following table summarizes the actions taken for each of the\npossible consonant (vertical) and vowel/mark (horizontal) sequences:\n\n\n:::{table} Summary of contextual-replacement rules for <samp>\"Consonant,Vowel\"</samp> sequences in <abbr>PUA</abbr> fallback\n\n|        |  AV  |  BV  |  TV   |  AV,TV    |\n|:-------|:-----|:-----|:------|:-----------|\n| **NC** |      |      | `SD`  |            |\n| **AC** | `SL` |      | `SDL` | `SL`       |\n| **RC** |      | `RD` | `SD`  |            |\n| **DC** |      | `SD` | `SD`  |            | \n:::\n\nThese replacements take the place of both <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions and <abbr title=\"Glyph Positioning table\">GPOS</abbr>\npositioning in modern OpenType fonts.\n\nShaping engines can replace the original codepoints with the\nappropriate alternates from the <abbr title=\"Private Use Area\">PUA</abbr> block by testing for the above\nconditions. \n\nWith each consonant, vowel, and mark character correctly classified,\nthe shaping engine can process the text run.\n\nThere are three top-level stages:\n\n1. Decomposing all Am vowel signs\n2. Reordering sequences of marks\n3. Remapping codepoints to the appropriate <abbr>PUA</abbr> alternates\n\n\n### Stage 1: Decomposing all Am vowel signs ###\n\nThe Thai alphabet includes one character that must be decomposed for\nshaping purposes, the vowel sign <samp>\"Am\"</samp>. The decomposition is\ncanonically defined, resulting in the sequence <samp>\"Nikhahit,Sara Aa\"</samp>.\n\n  - Sara Am (`U+0E33`) decomposes to <samp>\"Nikhahit,Sara Aa\"</samp> (`U+0E4D`,`U+0E32`).\n\nThe shaping engine must keep track of the fact that the <samp>\"Nikhahit\"</samp>\nmark originated as part of an <samp>\"Am\"</samp> sign, because these decomposed\nmarks are handled differently during the mark-reordering stage.\n\n:::{figure-md}\n![Glyph decomposition](images/thai-lao/thai-am-decomposition.svg \"Glyph decomposition\"){.shaping-demo .inline-svg .greyscale-svg #thai-am-decomposition}\n\nGlyph decomposition\n:::\n\n```{svg-color-toggle-button} thai-am-decomposition\n```\n\n### Stage 2: Reordering sequences of marks ###\n\nIn this stage, certain sequences of consecutive marks may need to be\nreordered.\n\nAs is the case in OpenType-font text runs, two conditions should be checked\nfor possible reordering.\n\n  - A <samp>\"Nikhahit\"</samp> mark that originated as part of an <samp>\"Am\"</samp> sign (which\n    was decomposed in stage one, above) must be reordered so that it\n    occurs before any tone markers in the sequence of marks.\n  - A <samp>\"Phinthu\"</samp> mark must be reordered so that it occurs after any\n    <samp>\"Sara U\"</samp> or <samp>\"Sara Uu\"</samp> marks.\n\t\n> Note: <samp>\"Nikhahit\"</samp> marks that were not originally part of an <samp>\"Am\"</samp> sign\n> should not be reordered.\n\n> Note: Shaping engines may choose to implement the Phinthu\n> reordering rule by modifying the combining classes assigned to\n> <samp>\"Phinthu\"</samp>, <samp>\"Sara U\"</samp>, and <samp>\"Sara Uu\"</samp> as necessary before processing\n> the text run, or by performing a sorting step at this stage.\n\n\n### Stage 3: Remapping codepoints to the appropriate <abbr>PUA</abbr> alternates ###\n\nThe contextual replacement rules described above can be implemented in\na pair of state machines, one for above-base replacement moves and one\nfor below-base replacement moves.\n\nEach consonant codepoint and subsequent (possibly empty) sequence of\nmarks should be processed in turn through both machines. The output\nfor each codepoint will be one of the standard replacement actions:\n\n  - `SD`: replace the codepoint with the `SHIFT_DOWN` alternate\n  - `SL`: replace the codepoint with the `SHIFT_LEFT` alternate\n  - `SDL`: replace the codepoint with the `SHIFT_DOWN_AND_LEFT` alternate\n  - `RD`: replace the codepoint with the `REMOVE_DESCENDER` alternate\n  - _null_: no replacement should be made\n\nThe above-base state machine tracks four possible states, designated\n`AS0` through `AS3`. \n\nThe initial states of the possible codepoints are as follows:\n\n:::{table} Initial states for above-base <abbr>PUA</abbr> remapping\n\n| PUA class | initial state |\n|:----------|:--------------|\n| NC        | AS0           |\n| AC        | AS1           |\n| RC        | AS0           |\n| DC        | AS0           |\n| _Other_   | AS3           |\n:::\n\n\nThe following state machine table lists the replacement action to take\nand the resulting next state for each possible mark type that may\nfollow a consonant:\n\n\n:::{table} State-machine table for above-base <abbr>PUA</abbr> remapping\n\n| Input state | AV         | BV         | TV         |\n|:------------|:-----------|:-----------|:-----------|\n| AS0         | _null_,AS3 | _null_,AS0 | `SD`,AS3   |\n| AS1         | `SL`,AS2   | _null_,AS1 | `SDL`,AS2  |\n| AS2         | _null_,AS3 | _null_,AS2 | `SL`,AS3   |\n| AS3         | _null_,AS3 | _null_,AS3 | _null_,AS3 |\n:::\n\n\nThe below-base state machine tracks three possible states, designated\n`BS0` through `BS2`. \n\nThe initial states of the possible codepoints are as follows:\n\n:::{table} Initial states for below-base <abbr>PUA</abbr> remapping\n\n| PUA class | initial state |\n|:----------|:--------------|\n| NC        | BS0           |\n| AC        | BSO           |\n| RC        | BS1           |\n| DC        | BS2           |\n| _Other_   | BS2           |\n:::\n\n\nThe following state machine table lists the replacement action to take\nand the resulting next state for each possible mark type that may\nfollow a consonant:\n\n:::{table} State-machine table for below-base <abbr>PUA</abbr> remapping\n\n| Input state | AV         | BV         | TV         |\n|:------------|:-----------|:-----------|:-----------|\n| BS0         | _null_,BS0 | _null_,BS2 | _null_,BS0 |\n| BS1         | _null_,BS1 | `RD`,BS2   | _null_,BS1 |\n| BS2         | _null_,BS2 | `SD`,BS2   | _null_,BS2 |\n:::\n\nWhen the necessary replacement action for each codepoint has been\ndetermined, codepoints can be replaced with the <abbr title=\"Private Use Area\">PUA</abbr> codepoints from\nthe following table.\n\nNote that Windows fonts and MacOS fonts used different mappings.\n\n\n#### SD mappings ####\n\n:::{table} `SD` mappings by platform\n\n| Input    | Windows  | MacOS    |\n|:---------|:---------|:---------|\n| `U+0E48` | `U+F70A` | `U+F88B` |\n| `U+0E49` | `U+F70B` | `U+F88E` |\n| `U+0E4A` | `U+F70C` | `U+F891` |\n| `U+0E4B` | `U+F70D` | `U+F894` |\n| `U+0E4C` | `U+F70E` | `U+F897` |\n| `U+0E38` | `U+F718` | `U+F89B` |\n| `U+0E39` | `U+F719` | `U+F89C` |\n| `U+0E3A` | `U+F71A` | `U+F89D` |\n:::\n\n\n#### SL mappings ####\n\n:::{table} `SL` mappings by platform\n\n| Input    | Windows  | MacOS    |\n|:---------|:---------|:---------|\n| `U+0E48` | `U+F713` | `U+F88A` |\n| `U+0E49` | `U+F714` | `U+F88D` |\n| `U+0E4A` | `U+F715` | `U+F890` |\n| `U+0E4B` | `U+F716` | `U+F893` |\n| `U+0E4C` | `U+F717` | `U+F896` |\n| `U+0E31` | `U+F710` | `U+F884` |\n| `U+0E34` | `U+F701` | `U+F885` |\n| `U+0E35` | `U+F702` | `U+F886` |\n| `U+0E36` | `U+F703` | `U+F887` |\n| `U+0E37` | `U+F704` | `U+F888` |\n| `U+0E47` | `U+F712` | `U+F889` |\n| `U+0E4D` | `U+F711` | `U+F899` |\n:::\n\n\n#### SDL mappings ####\n\n:::{table} `SDL` mappings by platform\n\n| Input    | Windows  | MacOS    |\n|:---------|:---------|:---------|\n| `U+0E48` | `U+F705` | `U+F88C` |\n| `U+0E49` | `U+F706` | `U+F88F` |\n| `U+0E4A` | `U+F707` | `U+F892` |\n| `U+0E4B` | `U+F708` | `U+F895` |\n| `U+0E4C` | `U+F709` | `U+F898` |\n:::\n\n\n#### RD mappings ####\n\n:::{table} `RD` mappings by platform\n\n| Input    | Windows  | MacOS    |\n|:---------|:---------|:---------|\n| `U+0E0D` | `U+F70F` | `U+F89A` |\n| `U+0E10` | `U+F700` | `U+F89E` |\n:::\n"
  },
  {
    "path": "opentype-shaping-tibetan.md",
    "content": "```{include} /_global.md\n```\n\n# Tibetan shaping in OpenType #\n\nThis document details the shaping procedure needed to display text\nruns in the Tibetan script.\n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Shaping classes and subclasses](#shaping-classes-and-subclasses)\n      - [Tibetan character tables](#tibetan-character-tables)\n  - [The `<tibt>` shaping model](#the-tibt-shaping-model)\n      - [Stage 1: Applying the language substitution features from <abbr>GSUB</abbr>](#stage-1-applying-the-language-substitution-features-from-gsub)\n      - [Stage 2: Applying all basic substitution features from <abbr>GSUB</abbr>](#stage-2-applying-all-basic-substitution-features-from-gsub)\n      - [Stage 3: Applying remaining positioning features from <abbr>GPOS</abbr>](#stage-3-applying-remaining-positioning-features-from-gpos)\n\n\n## General information ##\n\nThe Tibetan script was modeled on seventh-century [Indic\nscripts](opentype-shaping-indic-general.md) and incorporates several\npatterns and conventions found in Indic scripts. However, Tibetan\ndeveloped independently and possesses enough major distinctions that\nit is inadvisable to attempt supporting it in a general-purpose\nIndic shaping engine. \n\nThe Tibetan script is used to write multiple languages, most commonly\nTibetan, Dzongkha, Sikkimese, Ladakhi, and Balti. In addition,\nSanskrit may be written in Tibetan, but the Tibetan script is not used\nfor Vedic texts, therefore Tibetan text runs are not expected to\ninclude any glyphs from the Vedic Extensions block of Unicode. \n\nThe Tibetan script tag defined in OpenType is `<tibt>`. \n\nNotably, Tibetan was originally included in version 1.0 of the Unicode\nstandard, encoded in a block that closely mirrored the structure of\nthe Indic scripts. However, this encoding for Tibetan was removed in\nUnicode 1.1. A new encoding for Tibetan was included in version 2.0 of\nthe Unicode standard, more appropriately structured for the writing\nsystem.\n\n## Terminology ##\n\nOpenType shaping uses a standard set of terms for Brahmi-derived and\nIndic scripts.  The terms used colloquially in any particular language\nmay vary, however, potentially causing confusion.\n\n**Matra** is the standard term for a dependent vowel sign. Syllables\nin Tibetan script can include sequences of multiple vowels and,\ntherefore, multiple matras. Each matra is either a **above-base** or\na **below-base** form.\n\nSeveral compound matra codepoints are included in the Tibetan Unicode\nblock. However, these are only used when transcribing Sanskrit\ntext. Otherwise, Tibetan syllables will include at most one matra.\n\n**Tsheng** or **tsek** is the term for the small, dot-like mark that is placed\nbetween syllables in a Tibetan word. Sequences of tsek marks are\noccasionally used to justify lines of text within a block. For\nline-breaking purposes, words may be broken after a tsek mark.\n\n**Srog-med** is the term for the \"virama\" or \"halant\" sign (`U+0F84`). However,\nthe Tibetan script does not natively use the srong-med mark: it is\nused only when transcribing text in a language that requires a \"halant\".\n\n<!--- **Chandrabindu** (or simply **Bindu**) is the standard term for the\ndiacritical mark indicating that the preceding vowel should be\nnasalized. Tibetan script does not use a chandrabindu; however, the\n_BINDU_ category is used for other marks during the\nsyllable-identification stage in order to maintain compatibility with\nother scripts. --->\n\n<!--- Tibetan has a bindu, but it seems to be there just for\n      transcription --->\n\nThe term **base consonant** in Tibetan is analogous to its usage in\nIndic and Brahmi-derived scripts. The base consonant of a syllable is\nrendered in its full form; subsequent consonants are generally shown\nin **subjoined** form, stacked below the base consonant.\n\nThe Tibetan Unicode block includes separate codepoints for the base\nand subjoined forms of each consonant. Therefore, shaping engines are\nnot required to determine the base consonant of a syllable\nalgorithmically.\n\nTibetan also employs the term **head consonant**, which refers to the\nconsonant in a stack that is in the visually topmost position. Certain\nconsonants take on an alternate form when used in stack-initial\npositions (such as <samp>\"Ra\"</samp>). When the alternate form is visually the\ntopmost consonant in the stack, it is regarded as the head consonant,\neven though the consonant that follows is regarded as the base\nconsonant.\n\nFor example, the sequence <samp>\"Ra,Subjoined Ka\"</samp> (`U+0F62`,`U+0F90`) is\nrendered with the <samp>\"Ka\"</samp> in its non-subjoined, base-consonant form and the <samp>\"Ra\"</samp>\npositioned above. In this circumstance, the <samp>\"Ra\"</samp> would still be\nregarded as the head consonant.\n\n\n\nWhere possible, using the standard terminology is preferred, as the\nuse of a language-specific term necessitates choosing one language\nover all of the others that share a common script.\n\n## Glyph classification ##\n\nShaping Tibetan text depends on the shaping engine correctly\nclassifying each glyph in the run. As with most other scripts, the\nclassifications must distinguish between consonants, vowels\n(independent and dependent), numerals, punctuation, and various types\nof diacritical mark. \n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as glyph reordering). Therefore,\nTibetan glyphs must additionally be classified by how they are treated\nwhen shaping a run of text.\n\n### Shaping classes and subclasses ###\n\nThe shaping classes listed in the tables that follow are defined so\nthat they capture the positioning rules used by Tibetan script. \n\nFor most codepoints, the _Shaping class_ is synonymous with the `Indic\nSyllabic Category` defined in Unicode. However, there are some\ndistinctions, where the defined category does not fully capture the\nbehavior of the character in the shaping process.\n\nSeveral of the diacritic and syllable-modifying marks behave according\nto their own rules and, thus, have a special class. These include\n`BINDU` and `VISARGA`. Some less-common marks behave according to\nrules that are similar to these common marks, and are therefore\nclassified with the corresponding common mark.\n\nLetters generally fall into the classes `CONSONANT`,\n`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the\nshaping engine parse and identify key positions in a syllable. For\nexample, Unicode categorizes dependent vowels as `Mark [Mn]`, but the\nshaping engine must be able to distinguish between dependent vowels\nand diacritical marks (which are categorized as `Mark [Mn]`).\n\nTibetan uses two subclasses of consonant, `CONSONANT_SUBJOINED` and\n`CONSONANT_HEAD`. \n\nThe `CONSONANT_SUBJOINED` subclass is used for consonants immediately\nfollowing the base consonant of a syllable and before the vowel\nsound. Unlike most Indic scripts, Tibetan explicitly encodes the\nsubjoined forms of each consonant in a separate codepoint. Therefore,\nthe shaping engine is not responsible for identifying the base and\nbelow-base consonants (or other special forms) and fonts are not\nresponsible for implementing substitution features to substitute\nsubjoined forms in context.\n\nThe `CONSONANT_HEAD` subclass is used for special transliteration\nletters that are not found in the Tibetan language. They should pass\nchecks for consonants, but do not evoke special shaping behavior.\n\nOther characters, such as symbols, need no special\nattention from the shaping engine, so they are not assigned a shaping\nclass.\n\nNumbers are classified as `NUMBER`, even though they evoke no special\nbehavior from the Indic shaping rules, because there are OpenType features that\nmight affect how the respective glyphs are drawn, such as `tnum`,\nwhich specifies the usage of tabular-width numerals, and `sups`, which\nreplaces the default glyphs with superscript variants.\n\nMarks, subjoined consonants, and dependent vowels are further labeled\nwith a mark-placement subclass, which indicates where the glyph will\nbe placed with respect to the base character to which it is\nattached. The actual position of the glyphs is determined by the\nlookups found in the font's <abbr title=\"Glyph Positioning table\">GPOS</abbr> table.\n\nThere are two basic _mark-placement subclasses_ for dependent vowel signs\n(matras). Each corresponds to the visual position of the matra with\nrespect to the base consonant to which it is attached:\n\n  - `TOP_POSITION` matras are positioned above the base consonant.\n  - `BOTTOM_POSITION` matras are positioned below the base consonant.\n  \nSyllable modifiers and other marks may be placed in `TOP` or `BOTTOM`\nposition, or:\n\n  - `LEFT_POSITION` marks are positioned to the left of the base consonant.\n  - `RIGHT_POSITION` marks are positioned to the right of the base consonant.\n\nThese positions may also be referred to elsewhere in shaping documents as:\n\n  - _Above-base_ \n  - _Below-base_ \n  - _Pre-base_ \n  - _Post-base_ \n  \nrespectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations\ncorresponds to Unicode's preferred terminology. The _Pre_, _Post_,\n_Above_, and _Below_ terminology is used in the official descriptions\nof OpenType <abbr title=\"Glyph Substitution table\">GSUB</abbr> and <abbr title=\"Glyph Positioning table\">GPOS</abbr> features. Shaping engines may, internally,\nuse whichever terminology is preferred.\n\nFor most mark and dependent-vowel codepoints, the _mark-placement\nsubclass_ is synonymous with the `Indic Positional Category` defined\nin Unicode. However, there are some distinctions, where the defined\ncategory does not fully capture the behavior of the character in the\nshaping process. \n\n\n\n### Tibetan character tables ###\n\nSeparate character tables are provided for the Tibetan block as well\nas for other miscellaneous characters that are used in `<tibt>` text\nruns:\n\n  - [Tibetan character table](character-tables/character-tables-tibetan.md#tibetan-character-table)\n  - [Miscellaneous character table](character-tables/character-tables-tibetan.md#miscellaneous-character-table)\n\nThe tables list each codepoint along with its Unicode general\ncategory, its shaping class, and its mark-placement subclass. The\ncodepoint's Unicode name and an example glyph are also provided.\n\nFor example:\n\n:::{table} Example character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+0F40`   | Letter           | CONSONANT         | _null_                     | &#x0F40; Ka                  |\n| | | | |\n|`U+0F7E`   | Mark [Mn]        | BINDU             | TOP_POSITION               | &#x0F7E; Sign Rjes Su Nga Ro |\n:::\n\n\nCodepoints with no assigned meaning are\ndesignated as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine.\n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning for codepoints in the _Mark_ category. Assigned, non-mark\ncodepoints have a _null_ in this column and evoke no special\nmark-placement behavior. Marks tagged with [Mn] in the _Unicode\ncategory_ column are categorized as non-spacing; marks tagged with\n[Mc] are categorized as spacing-combining.\n\nSome codepoints in the tables use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific, script-aware behavior.\n\nOther important characters that may be encountered when shaping runs\nof Tibetan text include the dotted-circle placeholder (`U+25CC`), \nthe no-break space (`U+00A0`), and the zero-width space (`U+200B`).\n\nThe dotted-circle placeholder is frequently used when displaying a\ndependent vowel (matra) or a combining mark in isolation. Real-world\ntext syllables may also use other characters, such as hyphens or dashes,\nin a similar placeholder fashion; shaping engines should cope with\nthis situation gracefully.\n\n<!--- The zero-width joiner is primarily used to prevent the formation of a\nsubjoining form from a <samp>\"_Consonant_,Halant,_Consonant_\"</samp> sequence. The sequence\n<samp>\"_Consonant_,Halant,ZWJ,_Consonant_\"</samp> blocks the substitution of a\nsubjoined form for the second consonant. --->\n\n<!---\nA secondary usage of the zero-width joiner is to prevent the formation of\n<samp>\"Reph\"</samp>. An initial <samp>\"Ra,Halant,ZWJ\"</samp> sequence should not produce a <samp>\"Reph\"</samp>,\nwhere an initial <samp>\"Ra,Halant\"</samp> sequence without the zero-width joiner\notherwise would.\n--->\n\nThe no-break space is primarily used to insert spaces between\nphrases. Tibetan text does not employ inter-word spaces. Consequently,\nwhen spaces are inserted into a text run, it is important that they be\npreserved: line-breaking algorithms must not break lines after a\nTibetan space, so the no-break space character is used instead of the\ntraditional space. \n\nThe no-break space may also be used to display those codepoints that\nare defined as non-spacing (marks, dependent vowels (matras),\nbelow-base consonant forms, and post-base consonant forms) in an\nisolated context, as an alternative to displaying them superimposed on\nthe dotted-circle placeholder. \n\nThe Wheel of Dharma symbol (`U+2638`) from the Miscellaneous Symbols\nblock also occurs in Tibetan texts.\n\n\n## The `<tibt>` shaping model ##\n\nProcessing a run of `<tibt>` text involves three top-level stages:\n\n1. Applying the language substitution features from <abbr>GSUB</abbr>\n2. Applying all basic substitution features from <abbr>GSUB</abbr>\n3. Applying all remaining positioning features from <abbr>GPOS</abbr>\n\n\nAs with other Brahmi-derived and Indic scripts, the basic substitution\nfeatures must be applied to the run in a specific order. The\npositioning features in the final stage, however, do not have a\nmandatory order.\n\nUnlike many other Brahmi-derived and Indic scripts, shaping Tibetan\ntext does not require a syllable-identification stage nor any\nreordering moves.\n\nA syllable in Tibetan is usually separated from subsequent syllables\nor words by a \"tsheng\" mark at the end of the syllable. A word-final\nsyllable may also be separated by a punctuation mark or a non-breaking\nspace.\n\nEach syllable contains exactly one vowel sound. Valid syllables may\nbegin with either a consonant or an independent vowel. \n\nThe general form of a consonant-based syllable in Tibetan begins with\nan optional pre-base consonant (also called a \"prefix\"), followed by\nthe syllable's base consonant, zero or more subjoined\nconsonants, zero or more dependent-vowel signs (matras), an optional\npost-base consonant (also called a \"suffix\") and zero or more syllable\nmodifiers or diacritical marks.\n\n:::{figure-md}\n![Tibetan syllable example](/images/tibetan/tibetan-syllable.svg \"Tibetan syllable example\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-syllable}\n\nTibetan syllable example\n:::\n\n```{svg-color-toggle-button} tibetan-syllable\n```\n\nThe prefix, suffix, and base consonants will all be from the\n`CONSONANT` shaping class. All subjoined consonants will be from the\n`CONSONANT_SUBJOINED` class.\n\nThe prefix, suffix, and base consonant are all shown in\ntheir default form and position. Any subjoined consonants are stacked\nbelow the base consonant. Any dependent vowel signs (matras) are\nrendered as marks positioned either above the base consonant or below\nthe consonant stack.\n\n> Note: A base consonant that is not accompanied by a\n> dependent vowel sign (matra) carries the script's inherent vowel\n> sound. This vowel sound is changed by a dependent vowel sign\n> following the consonant.\n\n> Note: Prefix and suffix consonants do not carry a vowel sound. This\n> does not affect shaping, except in that Tibetan differs from many\n> other scripts in not employing a \"halant\" or vowel-killer sign to\n> designate the suppression of these sounds.\n\nCertain consonant sequences may take on alternate shapes to provide a\nbetter visual fit with adjoining characters (such as within a\nconsonant stack). However, these alternates are not considered\northographically distinct forms.\n\nNative words in Tibetan do not incorporate more than a single\ndependent-vowel sign (matra) in a syllable. However, multiple\ndependent-vowel signs may be used to represent loanwords from\nSanskrit, Chinese, and many other languages.\n\nIn addition to valid syllables, standalone sequences may occur, such\nas when an isolated codepoint is shown in example text.\n\n> Note: Foreign loanwords, when written in the Tibetan script, may\n> not adhere to the syllable-formation rules described above. \n\n\n\n\n### Stage 1: Applying the language substitution features from <abbr>GSUB</abbr> ###\n\nThe language-substitution stage applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these substitutions must be performed is fixed:\n\n\tlocl\n\tccmp\n\n\n#### Stage 1, step 1: locl ####\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\n\n#### Stage 1, step 2: ccmp ####\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of glyphs. \n \nIn `<tibt>` text, this may include decompositions of multi-part\ndependent vowel signs (matras).\n\nThe Tibetan Unicode block includes several multi-part matras, most\nintended for use transcribing Sanskrit. However, usage is discouraged\nfor several of these matras, and two of the codepoints have been\nofficially deprecated. In their place, text authors are encouraged to\nuse the corresponding sequence of single-part matras.\n\n  - `U+0F77` is deprecated and should be replaced by <samp>\"`U+0FB2`,`U+0F81`\"</samp>\n  - `U+0F79` is deprecated and should be replaced by <samp>\"`U+0FB3`,`U+0F81`\"</samp>\n  - `U+0F73` can be replaced by <samp>\"`U+0F71`,`U+0F72`\"</samp>\n  - `U+0F75` can be replaced by <samp>\"`U+0F71`,`U+0F74`\"</samp>\n  - `U+0F81` can be replaced by <samp>\"`U+0F71`,`U+0F80`\"</samp>\n  \nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs. \n\n\n:::{figure-md}\n![Composition-decomposition substitution](images/tibetan/tibetan-ccmp.svg \"Composition-decomposition substitution\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-ccmp}\n\nComposition-decomposition substitution\n:::\n\n```{svg-color-toggle-button} tibetan-ccmp\n```\n\n\n### Stage 2: Applying all basic substitution features from <abbr>GSUB</abbr> ###\n\nIn this stage, the basic substitution features from the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nare applied. The order in which these features are applied is not\ncanonical; they should be applied in the order in which they appear in\nthe <abbr title=\"Glyph Substitution table\">GSUB</abbr> table in the font. \n\n\tabvs\n\tblws\n\tcalt\n\tliga\n\n\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\n:::{figure-md}\n![Application of the abvs feature](/images/tibetan/tibetan-abvs.svg \"Application of the abvs feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-abvs}\n\nApplication of the abvs feature\n:::\n\n```{svg-color-toggle-button} tibetan-abvs\n```\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. In Tibetan, this can include contextual ligatures\ninvolving below-base dependent vowel marks (matras) or subjoined\nconsonants.\n\n:::{figure-md}\n![Application of the blws feature](/images/tibetan/tibetan-blws.svg \"Application of the blws feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-blws}\n\nApplication of the blws feature\n:::\n\n```{svg-color-toggle-button} tibetan-blws\n```\n\nThe `calt`  feature substitutes glyphs with contextual alternate\nforms. In general, this involves replacing the default form of a\nstacking glyph (such as a subjoined consonant) with an alternate that\nprovides a preferable connection to an adjacent glyph in the stack.\n\nThe `calt` feature performs substitutions that are not mandatory for\northographic correctness. The substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\n:::{figure-md}\n![Application of the calt feature](/images/tibetan/tibetan-calt.svg \"Application of the calt feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-calt}\n\nApplication of the calt feature\n:::\n\n```{svg-color-toggle-button} tibetan-calt\n```\n\n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\n:::{figure-md}\n![Application of the liga feature](/images/tibetan/tibetan-liga.svg \"Application of the liga feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-liga}\n\nApplication of the liga feature\n:::\n\n```{svg-color-toggle-button} tibetan-liga\n```\n\n\n### Stage 3: Applying remaining positioning features from <abbr>GPOS</abbr> ###\n\nIn this stage, mark positioning, kerning, and other <abbr title=\"Glyph Positioning table\">GPOS</abbr> features are\napplied. As with the preceding stage, the order in which these\nfeatures are applied is not canonical; they should be applied in the\norder in which they appear in the <abbr title=\"Glyph Positioning table\">GPOS</abbr> table in the font.\n\n        kern\n\t\tabvm\n        blwm\n\t\tmkmk\n\n> Note: The `kern` feature is usually applied at this stage, if it is\n> present in the font. However, `kern` is not mandatory for shaping\n> Tibetan text and may be disabled by user preference.\n\nThe `kern` feature adjusts the horizontal positioning of\nglyphs.\n\n:::{figure-md}\n![Application of the kern feature](/images/tibetan/tibetan-kern.svg \"Application of the kern feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-kern}\n\nApplication of the kern feature\n:::\n\n```{svg-color-toggle-button} tibetan-kern\n```\n\nThe `abvm` feature positions above-base glyphs for attachment to base\ncharacters. In Tibetan, this includes tone markers, diacritical marks,\nand above-base dependent vowels (matras).\n\n:::{figure-md}\n![Application of the abvm feature](/images/tibetan/tibetan-abvm.svg \"Application of the abvm feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-abvm}\n\nApplication of the abvm feature\n:::\n\n```{svg-color-toggle-button} tibetan-abvm\n```\n\nThe `blwm` feature positions below-base glyphs for attachment to base\ncharacters. In Tibetan, this includes subjoined consonants as well as\nbelow-base dependent vowels (matras), and diacritical marks.\n\n:::{figure-md}\n![Application of the blwm feature](/images/tibetan/tibetan-blwm.svg \"Application of the blwm feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-blwm}\n\nApplication of the blwm feature\n:::\n\n```{svg-color-toggle-button} tibetan-blwm\n```\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph. In Tibetan, this also includes attaching marks to\nsubjoined consonants or dependent vowels.\n\n:::{figure-md}\n![Application of the mkmk feature](/images/tibetan/tibetan-mkmk.svg \"Application of the mkmk feature\"){.shaping-demo .inline-svg .greyscale-svg #tibetan-mkmk}\n\nApplication of the mkmk feature\n:::\n\n```{svg-color-toggle-button} tibetan-mkmk\n```\n"
  },
  {
    "path": "opentype-shaping-use.md",
    "content": "# Universal Shaping Engine script shaping in OpenType #\n\nThis document details the default shaping procedure needed to display\ntext runs in scripts supported by the Universal Shaping Engine (<abbr>USE</abbr>)\nmodel. \n\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n  - [The <abbr>USE</abbr> shaping model](#the-use-shaping-model)\n      - [Stage 1: Split vowel decomposition](#stage-1-split-vowel-decomposition)\n      - [Stage 2: Cluster identification](#stage-2-cluster-identification)\n      - [Stage 3: Basic cluster formation](#stage-3-basic-cluster-formation)\n\t      - [Stage 3, step 1: Applying the basic pre-processing features from <abbr>GSUB</abbr>](#stage-3-step-1-applying-the-basic-pre-processing-features-from-gsub)\n          - [Stage 3, step 2: Applying the basic reordering features from <abbr>GSUB</abbr>](#stage-3-step-2-applying-the-basic-reordering-features-from-gsub)\n          - [Stage 3, step 3: Applying the basic orthographic features from <abbr>GSUB</abbr>](#stage-3-step-3-applying-the-basic-orthographic-features-from-gsub)\n\t  - [Stage 4: Glyph reordering](#stage-4-glyph-reordering)\n\t      - [Stage 4, step 1: Applying the reordering features from <abbr>GSUB</abbr>](#stage-4-step-1-applying-the-reordering-features-from-gsub)\n\t      - [Stage 4, step 2: Performing property-based reordering moves](#stage-4-step-2-performing-property-based-reordering-moves)\n\t  - [Stage 5: Final feature application](#stage-5-final-feature-application)\n\t      - [Stage 5, step 1: Applying the final topographic features from <abbr>GSUB</abbr>](#stage-5-step-1-applying-the-final-topographic-features-from-gsub)\n\t      - [Stage 5, step 2: Applying the final typographic-presentation features from <abbr>GSUB</abbr>](#stage-5-step-2-applying-the-final-typographic-presentation-features-from-gsub)\n\t      - [Stage 5, step 3: Applying the final positioning features from <abbr>GPOS</abbr>](#stage-5-step-3-applying-the-final-positioning-features-from-gpos)\n  \n  \n  \n## General information ##\n\nThe Universal Shaping Engine (<abbr>USE</abbr>) model is used for complex scripts\nthat are not already supported by a dedicated OpenType shaping\nmodel. \n\n\"Complex\" scripts, in OpenType shaping terminology, are scripts that\nrequire some combination of glyph reordering, contextual joining\nbehavior, or the substitution of context-dependent forms for\nlinguistic or orthographic correctness.\n\nThe scripts covered by this model include Javanese, Balinese,\nBuginese, Batak, Chakma, Lepcha, Modi, Phags-pa, Tagalog, Siddham,\nSundanese, Tai Le, Tai Tham, Tai Viet, and many others.\n\nIn many ways, the <abbr title=\"Universal Shaping Engine\">USE</abbr> model is a generalization of the\n[Indic2](opentype-shaping-indic-general.md) OpenType \nshaping model, with adjustments made to correct shortfalls encountered\nwhen using the Indic2 shaping model, as well as additional changes\ndesigned to broaden the number of scripts that can be supported. For\nexample, the <abbr title=\"Universal Shaping Engine\">USE</abbr> model includes a step applying contextual\njoining-behavior features as is performed in the Arabic-like shaping\nmodel. \n\n> Note: The term _Indic3_ is sometimes used in comparison to Indic2\n> (or the corresponding increment of the script tags for existing\n> OpenType shaping models, such as `<dev3>` in comparison to\n> `<dev2>`).\n>\n> This terminology either indicates that a shaping engine has\n> implemented support for one or more of the Indic2 scripts within the\n> <abbr title=\"Universal Shaping Engine\">USE</abbr> model or it is merely a conversational convention to discuss\n> support for the Indic2-model scripts in <abbr title=\"Universal Shaping Engine\">USE</abbr>.\n>\n> At the present time, there is no formal definition for an Indic3\n> model, and there are not registered OpenType script tags for\n> `<dev3>` or any other third generation of the scripts handled by the\n> Indic2 model.\n\n<abbr title=\"Universal Shaping Engine\">USE</abbr> was introduced after the release of version 8.0 of the Unicode\nspecification. The intent is for <abbr title=\"Universal Shaping Engine\">USE</abbr> to support complex scripts added\nto future Unicode releases in addition to those already supported.\n\n\n## Terminology ##\n\nThe <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model uses a standard set of terms for the features of\nsupported scripts. These terms are similar to the standard terms used\nfor Indic scripts, but with several key distinctions.\n\nA **cluster** is the fundamental unit used in shaping; it consists of\na sequence of Unicode codepoints that will be processed as an atomic\nunit. An individual syllable typically corresponds to a single\ncluster, but any particular cluster might involve multiple syllables\nor a sequence that does not match the syllable-formation rules of the\nscript.\n\nA **base** character in the <abbr title=\"Universal Shaping Engine\">USE</abbr> model may be a consonant, an\nindependent vowel, a number, or any of several additional character\nclasses.\n\nA cluster's base consonant is generally rendered in its full form\n(although it may form ligatures), while other consonants in the\ncluster frequently take on secondary forms. Different <abbr title=\"Glyph Substitution table\">GSUB</abbr>\nsubstitutions may apply to a script's **pre-base** and **post-base**\nconsonants. Some of these substitutions create **above-base** or\n**below-base** forms. The **Reph** form of the consonant \"Ra\" is an\nexample.\n\nA **vowel** character in the <abbr title=\"Universal Shaping Engine\">USE</abbr> model is a dependent vowel or any of\nseveral additional marks with similar behavior. This class is similar\nto the \"matra\" class used in Indic shaping.\n\n**Halant** is the standard term for a \"vowel-killer\" sign.\n\n\n## Glyph classification ##\n\nThe <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model classifies characters based on a specific set of\nproperties defined for each codepoint in the Unicode Character\nDatabase (<abbr>UCD</abbr>), augmented with a small set of pre-defined property\noverrides.\n\nThe <abbr title=\"Unicode Character Database\">UCD</abbr> properties used for <abbr title=\"Universal Shaping Engine\">USE</abbr> character classification are:\n\n\tUnicode General Category (UGC)\n\tUnicode Indic Syllabic Category (UISC)\n\tUnicode Indic Positional Category (UIPC)\n\nIn addition, the Unicode Character Decomposition Mapping (<abbr>UCDM</abbr>) is used for\nall split vowels.\n\n\n### <abbr>USE</abbr> overrides ###\n\nAlthough, in general, the <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model relies on the <abbr title=\"Unicode General Category\">UGC</abbr>, <abbr title=\"Unicode Indic Syllabic Category\">UISC</abbr>,\nand <abbr title=\"Unicode Indic Positional Category\">UIPC</abbr> properties, the <abbr title=\"Universal Shaping Engine\">USE</abbr> model makes a small set of standardized\noverrides to the properties of certain specific characters.\n\nThe following table lists the complete set of <abbr title=\"Universal Shaping Engine\">USE</abbr> overrides. Shaping\nengines should implement the override properties in order to guarantee\ncorrect results.\n\n> Note: A _null_ in the following table indicates that the\n> corresponding Unicode property is not overridden for the codepoint\n> featured in that row. \n\n\n:::{table} Property overrides for <abbr>USE</abbr> shaping\n\n\n| Codepoint | Unicode UISC               | USE override UISC | Unicode UIPC | USE override UIPC | Glyph                                   |\n|:----------|:---------------------------|:------------------|:-------------|:------------------|:----------------------------------------|\n| `U+AA29`  | Vowel_Dependent            | Bindu             | _null_       | _null_            | &#xAA29; Cham Vowel Sign Aa             |\n| `U+0F71`  | Vowel_Dependent            | Nukta             | _null_       | _null_            | &#x0F71; Tibetan Vowel Sign Aa          |\n| `U+A982`  | Consonant_Succeeding_Repha | Tone_Mark         | _null_       | _null_            | &#xA982; Javanese Sign Layar            |\n| `U+0F7F`  | Visarga                    | Consonant_Dead    | _null_       | _null_            | &#x0F7F; Tibetan Sign Rnam Bcad         |\n| `U+11134` | Pure_Killer                | Gemination_Mark   | _null_       | _null_            | &#x11134; Chakma Maayyaa                |\n| `U+0F74`  | _null_                     | _null_            | Bottom       | Top               | &#x0F74; Tibetan Vowel Sign U           |\n| `U+AA35`  | _null_                     | _null_            | Bottom       | Top               | &#xAA35; Cham Consonant Sign            |\n| `U+1A18`  | _null_                     | _null_            | Bottom       | Top               | &#x1A18; Buginese Vowel Sign U          |\n| `U+0F72`  | _null_                     | _null_            | Top          | Bottom            | &#x0F72; Tibetan Vowel Sign I           |\n| `U+0F7A`  | _null_                     | _null_            | Top          | Bottom            | &#x0F7A; Tibetan Vowel Sign E           |\n| `U+0F7B`  | _null_                     | _null_            | Top          | Bottom            | &#x0F7B; Tibetan Vowel Sign Ee          |\n| `U+0F7C`  | _null_                     | _null_            | Top          | Bottom            | &#x0F7C; Tibetan Vowel Sign O           |\n| `U+0F7D`  | _null_                     | _null_            | Top          | Bottom            | &#x0F7D; Tibetan Vowel Sign Oo          |\n| `U+0F80`  | _null_                     | _null_            | Top          | Bottom            | &#x0F80; Tibetan Vowel Sign Reversed Ii |\n| `U+11127` | _null_                     | _null_            | Top          | Bottom            | &#x11127; Chakma Vowel Sign A           |\n| `U+11128` | _null_                     | _null_            | Top          | Bottom            | &#x11128; Chakma Vowel Sign I           |\n| `U+11129` | _null_                     | _null_            | Top          | Bottom            | &#x11129; Chakma Vowel Sign Ii          |\n| `U+1112D` | _null_                     | _null_            | Top          | Bottom            | &#x1112d; Chakma Vowel Sign Ai          |\n| `U+11130` | _null_                     | _null_            | Top          | Bottom            | &#x11130; Chakma Vowel Sign Oi          |\n| | | | | | |\n:::\n\n\n### <abbr>USE</abbr> classification table ###\n\nThe following table lists the classes utilized in the <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping\nmodel, along with a definition for each class. The class definitions\nrefer to the <abbr title=\"Unicode General Category\">UGC</abbr>, <abbr title=\"Unicode Indic Syllabic Category\">UISC</abbr>, and <abbr title=\"Unicode Indic Positional Category\">UIPC</abbr> categories in the Unicode standard,\nor to specific Unicode codepoints.\n\nThe symbols given in the \"Symbol\" column for each class may be used to\nexpress cluster-matching rules or other algorithms.\n\nVowels and modifiers may be further subclassified as described in the\n[<abbr title=\"Universal Shaping Engine\">USE</abbr> subclasses table](#use-subclasses-table) below.\n\n\n:::{table} Class definitions for <abbr>USE</abbr> shaping\n\n| USE classification        | Symbol | Definition                                                                                                    |\n|:--------------------------|:-------|:--------------------------------------------------------------------------------------------------------------|\n| BASE                      | `B`    | UISC = Number _or_ (UISC = Avagraha & UGC = Lo) _or_ (UISC = Bindu & UGC = Lo) _or_ UISC = Consonant _or_ (UISC = Consonant_Final & UGC = Lo) _or_ UISC = Consonant_Head_Letter _or_ (UISC = Consonant_Medial & UGC = Lo) _or_ (UISC = Consonant_Subjoined & UGC = Lo) _or_ UISC = Tone_Letter _or_ (UISC = Vowel & UGC = Lo) _or_ UISC = Vowel_Independent _or_ (UISC = Vowel_Dependent & UGC = Lo) |\n| Combining grapheme joiner | `CGJ`  | `U+034F`                                                                                                      |\n| CONS_MOD                  | `CM`   | UISC = Nukta _or_ Gemination_Mark _or_ Consonant_Killer                                                       |\n| CONS_WITH_STACKER         | `CS`   | UISC = Consonant_With_Stacker                                                                                 |\n| CONS_FINAL                | `F`    | (UISC = Consonant_Final & UGC != Lo) _or_ UISC = Consonant_Succeeding_Repha                                   |\n| CONS_FINAL_MOD            | `FM`   | UISC = Syllable_Modifier                                                                                      |\n| BASE_OTHER                | `GB`   | UISC = Consonant_Placeholder _or_ `U+2015`, `U+2022`, `U+25FB`–`U+25FE`                                       |\n| HALANT                    | `H`    | UISC = Virama _or_ Invisible_Stacker                                                                          |\n| HALANT_NUM                | `HN`   | UISC = Number_Joiner                                                                                          |\n| BASE_IND                  | `IND`  | (UISC = Consonant_Dead _or_ Modifying_Letter) _or_ (UGC = Po != `U+104E`, `U+2022`) _or_ `U+002D`             |\n| CONS_MED                  | `M`    | UISC = Consonant_Medial & UGC != Lo                                                                           |\n| BASE_NUM                  | `N`    | UISC = Brahmi_Joining_Number                                                                                  |\n| OTHER                     | `O`    | Any other SCRIPT_COMMON characters; White space characters, UGC=Zs                                            |\n| REPHA                     | `R`    | UISC = Consonant_Preceding_Repha _or_ Consonant_Prefixed                                                      |\n| Reserved character        | `Rsv`  | Any character not currently assigned or otherwise reserved in Unicode                                         |\n| SYM                       | `S`    | UGC = Sc _or_ (UGC = So & != `U+25CC`)                                                                        |\n| SYM_MOD                   | `SM`   | `U+1B6B`, `U+1B6C`, `U+1B6D`, `U+1B6E`, `U+1B6F`, `U+1B70`, `U+1B71`, `U+1B72`, `U+1B73`                      |\n| CONS_SUB                  | `SUB`  | UISC = Consonant_Subjoined & UGC != Lo                                                                        |\n| VOWEL                     | `V`    | (UISC = Vowel & UGC != Lo) _or_ (UISC = Vowel_Dependent & UGC != Lo) _or_ UISC = Pure_Killer                  |\n| VOWEL_MOD                 | `VM`   | (UISC = Bindu & UGC != Lo) _or_ UISC = Tone_Mark _or_ Cantillation_Mark _or_ Register_Shifter _or_ Visarga    |\n| VARIATION_SELECTOR        | `VS`   | `U+FE00`‒`U+FE0F`                                                                                             |\n| Word joiner               | `WJ`   | `U+2060`                                                                                                      |\n| Zero width joiner         | `ZWJ`  | UISC = Joiner                                                                                                 |\n| Zero width nonjoiner      | `ZWNJ` | UISC = Non_Joiner                                                                                             |\n| | | |\n:::\n\n\n### <abbr>USE</abbr> subclasses table ###\n\nVowels and modifiers may be further subclassified based on their\nposition relative to base characters. The subclasses incorporated in\nthe <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model are defined in the table below.\n\nSplit-vowel subclasses are not assigned a symbol because each split\nvowel must be decomposed into its components.\n\n\n:::{table} Subclasses for <abbr>USE</abbr> shaping\n\n| USE classification     | Symbol  | Definition                                                              |\n|:-----------------------|:--------|:------------------------------------------------------------------------|\n| CONS_MOD_ABOVE         | `CMAbv` | USE=CM & UIPC = Top                                                     |\n| CONS_MOD_BELOW         | `CMBlw` | USE=CM & UIPC = Bottom                                                  |\n| CONS_FINAL_ABOVE       | `FAbv`  | USE=F & UIPC = Top                                                      |\n| CONS_FINAL_BELOW       | `FBlw`  | USE=F & UIPC = Bottom                                                   |\n| CONS_FINAL_POST        | `FPst`  | USE=F & UIPC = Right                                                    |\n| CONS_MED_ABOVE         | `MAbv`  | USE=M & UIPC = Top                                                      |\n| CONS_MED_BELOW         | `MBlw`  | USE=M & UIPC = Bottom                                                   |\n| CONS_MED_PRE           | `MPre`  | USE=M & UIPC = Left                                                     |\n| CONS_MED_POST          | `MPst`  | USE=M & UIPC = Right                                                    |\n| SYM_MOD_ABOVE          | `SMAbv` | `U+1B6B`,`U+1B6D`,`U+1B6E`,`U+1B6F`,`U+1B70`,`U+1B71`,`U+1B72`,`U+1B73` |\n| SYM_MOD_BELOW          | `SMBlw` | `U+1B6C`                                                                |\n| VOWEL_ABOVE            | `VAbv`  | USE=V & UIPC = Top                                                      |\n| VOWEL_ABOVE_BELOW      | _null_  | USE=V & UIPC = Top_And_Bottom                                           |\n| VOWEL_ABOVE_BELOW_POST | _null_  | USE=V & UIPC = Top_And_Bottom_And_Right                                 |\n| VOWEL_ABOVE_POST       | _null_  | USE=V & UIPC = Top_And_Right                                            |\n| VOWEL_BELOW            | `VBlw`  | USE=V & UIPC = Bottom _or_ Overstruck                                   |\n| VOWEL_BELOW_POST       | _null_  | USE=V & UIPC = Bottom_And_Right                                         |\n| VOWEL_PRE              | `VPre`  | USE=V & UIPC = Left                                                     |\n| VOWEL_PRE_ABOVE        | _null_  | USE=V & UIPC = Top_And_Left                                             |\n| VOWEL_PRE_ABOVE_POST   | _null_  | USE=V & UIPC = Top_And_Left_And_Right                                   |\n| VOWEL_PRE_POST         | _null_  | USE=V & UIPC = Left_And_Right                                           |\n| VOWEL_POST             | `VPst`  | USE=V & UIPC = Right                                                    |\n| VOWEL_MOD_ABOVE        | `VMAbv` | USE=VM & UIPC = Top                                                     |\n| VOWEL_MOD_BELOW        | `VMBlw` | USE=VM & UIPC = Bottom _or_ Overstruck                                  |\n| VOWEL_MOD_PRE          | `VMPre` | USE=VM & UIPC = Left                                                    |\n| VOWEL_MOD_POST         | `VMPst` | USE=VM & UIPC = Right                                                   |\n| | | |\n:::\n\n\n## The <abbr>USE</abbr> shaping model ##\n\nThe <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model consists of five top-level stages.\n\n1. Decomposition of split vowels\n2. Identifying clusters\n3. Applying basic cluster formation features\n4. Glyph reordering\n5. Applying final features\n\nAll scripts supported by the <abbr title=\"Universal Shaping Engine\">USE</abbr> model will be processed in this same\npattern. However, not every script requires that actions be taken in\nevery operation.\n\nThe first two stages take place for the entire text run being\nshaped. Subsequently, stages 3, 4, and 5 are each conducted in order on a\nper-cluster basis, until every cluster in the run has been processed.\n\nThe substitution features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> and the positioning features from\n<abbr title=\"Glyph Positioning table\">GPOS</abbr> are applied to the text run in predefined features groups. Which\nfeatures are applied at each step in the process are described below.\n\n\n### Stage 1: Split vowel decomposition ###\n\nMost split vowels have a canonical decomposition defined in the\nUnicode specification. The <abbr title=\"Universal Shaping Engine\">USE</abbr> shaping model requires that all such\nsplit vowels be decomposed into their components before any further\nprocessing is performed. \n\nFor these vowels, the canonical decomposition must be performed prior\nto cluster identification. Because this decomposition is a\ncharacter-level operation, the shaping engine may choose to perform it\nearlier, such as during an initial Unicode-normalization stage. \n\nFor any split vowels that do not have a canonical decomposition, the\nactive font should provide a decomposition via the `ccmp` substitution\nfeature in <abbr title=\"Glyph Substitution table\">GSUB</abbr>. \n\nThe cluster-identification rules detailed in stage two are based on\nthe canonical decompositions, and do not take non-canonical <abbr title=\"Glyph Substitution table\">GSUB</abbr>\ndecomposition into account.\n\n\n### Stage 2. Cluster identification ###\n\nA cluster in the <abbr title=\"Universal Shaping Engine\">USE</abbr> model is defined according to a generalized,\nvisual pattern that is common to all supported scripts. Consequently,\nthe cluster-identification expressions used do not enforce linguistic\nor orthographic correctness.\n\nAn independent cluster will consist of a standalone codepoint that\ndoes not require further shaping, optionally followed by a variation\nselector. Independent clusters will match the expression:\n```markdown\n(IND | O | Rsv | WJ) VS?\n```\n\nA standard cluster features a required base character and may include\nmany optional elements. Standard clusters will match the expression:\n```markdown\n( R | CS )? ( B | GB ) VS? CMAbv* CMBlw* ( ((H B) | SUB) VS? CMAbv* CMBlw* )* MPre? MAbv? MBlw? MPst? VPre* VAbv* VBlw* VPst* VMPre* VMAbv* VMBlw* VMPst* FAbv* FBlw* FPst* FM?\n```\n\nA halant-terminated cluster occurs when any character other than a `B`\nfollows a `H`. Halant-terminated clusters will match the expression:\n```markdown\n( R | CS )? (B | GB) VS? CMAbv* CMBlw* ( ((H B) | SUB) VS? CMAbv* CMBlw*)* H\n```\n\nA number-joiner–terminated cluster will match the expression:\n```markdown\nN VS? (HN N VS?)* HN\n```\n\nA numeral cluster will match the expression:\n```markdown\nN VS? (HN N VS?)*\n```\n\nA symbol cluster will match the expression:\n```markdown\n(S | GB) VS? SMAbv* SMBlw*\n```\n\n> Note: Practically speaking, shaping engines are highly unlikely to\n> encounter more than a small number of sequential vowel or modifiers\n> in any real-world clusters. Thus, implementations may choose to\n> limit occurrences by limiting some of the above expressions to a\n> finite length, such as `VPre{0,4}` rather than `VPre*`.\n\nThe expressions above use state-machine syntax from the Ragel\nstate-machine compiler. The operators represent:\n\n```markdown\na* = zero or more copies of a\nb+ = one or more copies of b\nc? = optional instance of c\nd{n} = exactly n copies of d\nd{,n} = zero to n copies of d\nd{n,} = n or more copies of d\nd{n,m} = n to m copies of d\n!e = not e\n^f = character-level not f\ng.h = concatenation of g and h\ni|j = i or j\n( ) = grouping of expression elements\n```\n\nSequences not matching any of the above expressions should be regarded\nas broken. The shaping engine may make a best-effort attempt\nto shape the broken sequence, but making guarantees about the\ncorrectness or appearance of the final result is out of scope for this\ndocument.\n\nAfter the clusters have been identified, each of the subsequent \nshaping stages occurs on a per-cluster basis.\n\n\n### Stage 3: Basic cluster formation ###\n\nThe basic cluster formation stage is used to apply fundamental\nsubstitutions necessary for script and language correctness.\n\n#### Stage 3, step 1: Applying the basic pre-processing features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nThe basic pre-processing step applies mandatory substitution features\nusing the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this \nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features. \n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tlocl\n\tccmp\n\tnukt\n\takhn\n\nThe `locl` feature replaces default glyphs with any language-specific\nvariants, based on examining the language setting of the text run.\n\n> Note: Strictly speaking, the use of localized-form substitutions is\n> not part of the shaping process, but of the localization process,\n> and could take place at an earlier point while handling the text\n> run. However, shaping engines are expected to complete the\n> application of the `locl` feature before applying the subsequent\n> <abbr title=\"Glyph Substitution table\">GSUB</abbr> substitutions in the following steps.\n\nThe `ccmp` feature allows a font to substitute mark-and-base sequences\nwith a pre-composed glyph including the mark and the base, or to\nsubstitute a single glyph into an equivalent decomposed sequence of\nglyphs. \n\nIf present, these composition and decomposition substitutions must be\nperformed before applying any other <abbr title=\"Glyph Substitution table\">GSUB</abbr> lookups, because\nthose lookups may be written to match only the `ccmp`-substituted\nglyphs.\n\n> Note: The `ccmp` feature may perform decompositions of split vowels\n> that do not have a canonical decomposition defined in Unicode. Split\n> vowels that do have a canonical decomposition were decomposed in\n> stage one.\n\nThe `nukt` feature replaces <samp>\"_Consonant_,Nukta\"</samp> sequences with a\nprecomposed nukta-variant of the consonant glyph. \n\nThe `akhn` feature replaces specific sequences with required\nligatures. These sequences can occur anywhere in a cluster. \nAkhand characters have orthographic status equivalent to full\nconsonants in some languages, and fonts may have later substitution\nrules designed to match them in subsequences. Therefore, this\nfeature must be applied before all other many-to-one substitutions.\n\n\n#### Stage 3, step 2: Applying the basic reordering features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nThe basic reordering step applies mandatory substitution features from\n<abbr title=\"Glyph Substitution table\">GSUB</abbr> that affect reordering elements.\n\nFor these features, the glyph substitutions themselves are applied at this\nstep. However, the actual reordering of the glyphs does not take place\nuntil stage 4, step 1.\n\nThe order in which these substitutions must be performed is fixed for\nall <abbr title=\"Universal Shaping Engine\">USE</abbr> scripts:\n\n\trphf\n\tpref\n\n##### Stage 3, step 2.1: rphf #####\n\nThe `rphf` feature replaces cluster-initial <samp>\"Ra,Halant\"</samp> sequences with\nthe <samp>\"Reph\"</samp> glyph.\n\n> Note: although the glyph substitution is performed in this step, the\n> corresponding glyph reordering move is not performed until a later\n> stage. \n\n##### Stage 3, step 2.2: pref #####\n\nThe `pref` feature replaces pre-base-consonant glyphs with any special\nforms. \n\n> Note: although the glyph substitution is performed in this step, the\n> corresponding glyph reordering move is not performed until a later\n> stage. \n\n\n#### Stage 3, step 3: Applying the basic orthographic features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nThe basic orthographic step applies substitution features using the\nrules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this stage, glyph\nsequences should be tagged for possible application of <abbr title=\"Glyph Substitution table\">GSUB</abbr> features. \n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\trkrf\n\tabvf\n\tblwf\n\thalf\n\tpstf\n\tvatu\n\tcjct\n\nThe `rkrf` feature replaces <samp>\"_Consonant_,Halant,Ra\"</samp> sequences with the\n\"Rakaar\"-ligature form of the consonant glyph.\n\nThe `abvf` feature replaces above-base-consonant glyphs with any\nspecial forms. \n\nThe `blwf` feature replaces below-base-consonant glyphs with any\nspecial forms.\n\nThe `half` feature replaces <samp>\"_Consonant_,Halant\"</samp> sequences before the\nbase consonant with \"half forms\" of the consonant glyphs.\n\nThe `pstf` feature replaces post-base-consonant glyphs with any\nspecial forms.\n\nThe `vatu` feature replaces certain sequences with \"Vattu variant\"\nforms. \n\nThe `cjct` feature replaces sequences of adjacent consonants with\nconjunct ligatures. These sequences must match <samp>\"_Consonant_,Halant,_Consonant_\"</samp>.\n\n\n### Stage 4: Glyph reordering ###\n\nThe glyph-reordering stage moves dependent vowels, diacritics, and\nother mark glyphs in relation to the base consonant. All reordering is\nperformed in this stage, which is broken into two distinct steps:\n\n1. Applying the reordering features from <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n2. Performing property-based reordering moves\n\n\n#### Stage 4, step 1: Applying the reordering features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nIn this step, the reordering moves corresponding to the\nglyph-reordering features in <abbr title=\"Glyph Substitution table\">GSUB</abbr> are performed.\n\nAny glyph substitutions that apply to characters involved in these\nreordering moves were performed in stage 3, step 2. Therefore, this\nstep only requires moving glyphs to their final positions.\n\nThe order in which these substitutions must be performed is fixed for\nall <abbr title=\"Universal Shaping Engine\">USE</abbr> scripts:\n\n\trphf\n\tpref\n\n##### Stage 4, step 1.1: rphf #####\n\nIn stage 3, step 2, the `rphf` feature replaced cluster-initial\n<samp>\"Ra,Halant\"</samp> sequences with the <samp>\"Reph\"</samp> glyph. The <samp>\"Reph\"</samp> glyph is now\nreordered to its final position. The algorithm to determine the final\nposition of the <samp>\"Reph\"</samp> glyph is:\n\n  - Move the <samp>\"Reph\"</samp> right one position at a time.\n    - If the character immediately following the new position is an\n      explicit <samp>\"Halant\"</samp>, stop.\n    - If the character immediately before the new position is a full\n      base (`B`) character, stop.\n    - If the end of the cluster is reached, stop.\n\n##### Stage 4, step 1.2: pref #####\n\nIn stage 3, step 2, the `pref` feature replaced pre-base-consonant\nglyphs with special forms. The pre-base-consonant glyph is now\nreordered to its final position. The algorithm to determine the final\nposition of the pre-base-reordering consonant is:\n\n  - Move the pre-base-reordering consonant left one position at a\n    time.\n    - If the pre-base reordering consonant is to the left of the\n\t  first spacing glyph after an explicit <samp>\"Halant\"</samp>, stop.\n    - When the pre-base reordering consonant is to the left of the\n\t  first spacing glyph in the cluster, stop. \n\t- If the beginning of the cluster is reached, stop.\n\t\n> Note: Each cluster may have only one pre-base-reordering consonant\n> glyph. \n>\n> Note: scripts that use pre-base medial consonants may also make use\n> of the `pref` feature reordering.\n\n\n#### Stage 4, step 2: Performing property-based reordering moves ####\n\nIn this step, any characters that match one of the <abbr title=\"Universal Shaping Engine\">USE</abbr> reordering\nclassifications should be reordered into their final position. \n\n> Note: this classification-based reordering step ensures that\n> reordering characters not addressed by the active font's <abbr title=\"Glyph Substitution table\">GSUB</abbr>\n> features are ordered correctly.\n\nThe character classes reordered in this step are:\n\n```markdown\n`R`\t\t= `REPHA`\n`VPre`\t\t= `VOWEL_PRE`\n`VMPre`\t\t= `VOWEL_MOD_PRE`\n```\n\nPre-base `REPHA` glyphs that occur before a full base are reordered\nusing the <samp>\"Reph\"</samp> reordering algorithm described in [Stage 4, step 1.1](#stage-4-step-11-rphf),\njust as if the `rphf` feature had been applied to the glyph.\n\nPre-base `VOWEL_PRE` vowel glyphs, including both stand-alone `VOWEL_PRE` vowels\nand `VOWEL_PRE` components of split vowels, are reordered to\n   - before the base glyph\n   - before any other pre-base glyphs that were reordered in earlier steps\n   \nPre-base `VOWEL_MOD_PRE` vowel-modifier glyphs are reordered to\n   - before the base glyph\n   - before any pre-base `VOWEL_PRE` vowel glyphs\n   - before any other pre-base glyphs that were reordered in earlier steps\n\n\n### Stage 5: Final feature application ###\n\nThe final stage involves applying topographic joining features for\nconnected scripts, applying typographic-presentation features from\n<abbr title=\"Glyph Substitution table\">GSUB</abbr>, and applying positioning features from <abbr title=\"Glyph Positioning table\">GPOS</abbr>.\n\n\n#### Stage 5, step 1: Applying the final topographic features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nFor connected scripts, this step applies the substitutions to select\nthe correct topographic form for each glyph, based on its position in\nthe syllable.\n\nWhether or not each codepoint joins on the left or the right side is\ndetermined by the `Unicode Joining Type` (<abbr>UJT</abbr>) property defined in <abbr title=\"Unicode Character Database\">UCD</abbr>\nfor each codepoint.\n\n> Note: <abbr title=\"Universal Shaping Engine\">USE</abbr> does not support positional typographic features for any\n> non-connected scripts.\n\t\n\tisol\n\tinit\n\tmedi\n\tfina\n\n#### Stage 5, step 2: Applying the final typographic-presentation features from <abbr title=\"Glyph Substitution table\">GSUB</abbr> ####\n\nThe final typographic-presentation step applies mandatory substitution\nfeatures using the rules in the font's <abbr title=\"Glyph Substitution table\">GSUB</abbr> table. In preparation for this\nstage, glyph sequences should be tagged for possible application \nof <abbr title=\"Glyph Substitution table\">GSUB</abbr> features.\n\nThe order in which these features are applied is not canonical; they\nshould be applied in the order in which they appear in the <abbr title=\"Glyph Substitution table\">GSUB</abbr> table\nin the font.\n\n\tabvs\n\tblws\n\tcalt\n\tclig\n\thaln\n\tliga\n\tpres\n\tpsts\n\trclt\n\trlig\n\tvert\n\tvrt2\n\t\nThe `abvs` feature replaces above-base-consonant glyphs with special\npresentation forms. This usually includes contextual variants of\nabove-base marks or contextually appropriate mark-and-base ligatures.\n\nThe `blws` feature replaces below-base-consonant glyphs with special\npresentation forms. This usually includes replacing consonants that\nare adjacent to special consonant forms with contextual\nligatures.\n\nThe `calt` feature substitutes glyphs with contextual alternate\nforms.  In contrast to `rclt`, the `calt` feature performs\nsubstitutions that are not mandatory for orthographic\ncorrectness. However, unlike `rclt`, the substitutions made by `calt`\ncan be disabled by application-level user interfaces.\n\nThe `clig` feature substitutes optional ligatures that are on by\ndefault, but which are activated only in certain\ncontexts. Substitutions made by clig may be disabled by\napplication-level user interfaces. \n\nThe `haln` feature replaces syllable-final <samp>\"_Consonant_,Halant\"</samp> pairs with\nspecial presentation forms. This can include stylistic variants of the\nconsonant where placing the <samp>\"Halant\"</samp> mark on its own is\ntypographically problematic. \n\nThe `liga` feature substitutes standard, optional ligatures that are on\nby default. Substitutions made by `liga` may be disabled by\napplication-level user interfaces.\n\nThe `pres` feature replaces pre-base-consonant glyphs with special\npresentations forms. This can include consonant conjuncts, half-form\nconsonants, and stylistic variants of left-side dependent vowels\n(matras). \n\nThe `psts` feature replaces post-base-consonant glyphs with special\npresentation forms. This usually includes replacing right-side\ndependent vowels (matras) with stylistic variants or replacing\npost-base-consonant/matra pairs with contextual ligatures. \n\nThe `rclt` feature substitutes glyphs with contextual alternate\nforms. The `rclt` feature should be used to perform such substitutions\nthat are required by the orthography of the active script and\nlanguage. Substitutions made by `rclt` cannot be disabled by \napplication-level user interfaces.\n\nThe `rlig` feature substitutes glyph sequences with mandatory\nligatures. Substitutions made by `rlig` cannot be disabled by\napplication-level user interfaces.\n\n\n\n#### Stage 5, step 3: Applying the final positioning features from <abbr>GPOS</abbr> ####\n\n\tcurs\n\tdist\n\tkern\n\tmark\n\tabvm\n\tblwm\n\tmkmk\n\t\nThe `curs` feature perform cursive positioning in connected scripts or\ncursive styles. Each cursive glyph has an entry point and exit point;\nthe `curs` feature positions glyphs so that the entry point of the\ncurrent glyph meets the exit point of the preceding glyph.\n\nThe `dist` feature adjusts the horizontal positioning of\nglyphs. Unlike `kern`, adjustments made with `dist` do not require the\napplication or the user to enable any software _kerning_ features, if\nsuch features are optional. \n\nThe `kern` adjusts glyph spacing between pairs of adjacent glyphs.\n\nThe `mark` feature positions marks with respect to base glyphs.\n\nThe `abvm` feature positions above-base marks for attachment to base\ncharacters. This includes above-base dependent vowels (matras),\ndiacritical marks, syllable modifiers, and above-base consonant forms. \n\nThe `blwm` feature positions below-base marks for attachment to base\ncharacters. This includes below-base dependent vowels (matras),\ndiacritical marks, syllable modifiers, and below-base consonant forms.\n\nThe `mkmk` feature positions marks with respect to preceding marks,\nproviding proper positioning for sequences of marks that attach to the\nsame base glyph.\n"
  },
  {
    "path": "opentype-shaping-vedic-extensions.md",
    "content": "# Vedic Extensions in OpenType #\n\nThis document outlines the shaping information needed to display\ncharacters from the Unicode Vedic Extensions block, which may be used\nwithin text runs in many Indic scripts.\n\n**Contents**\n\n  - [General information](#general-information)\n  - [Terminology](#terminology)\n  - [Glyph classification](#glyph-classification)\n      - [Vedic Extensions character table](#vedic-extensions-character-table)\n  - [Shaping information](#shaping-information)\n\n\n\n## General information ##\n\nThe Vedic Extensions block encodes letters and marks that are used in\na large body of ancient literature written in the Vedic Sanskrit\nlanguage.\n\nPrimarily an oral language in the time period when the key literature\noriginated, Vedic Sanskrit has no native script. Therefore, texts may\nbe typeset in any one of the Indic scripts, using the Vedic Extensions\nto supplement the main script's character set.\n\n## Terminology ##\n\nIndividual Vedic Extension characters may be named by a combination of\nthe Vedic text in which the mark is used, the regional or manuscript\ntradition involved, or a simple visual or phonetic description of the\ncharacter. Some commonly used general categories are worth noting.\n\n**Udatta** is the term for a high tone on a vowel.\n\n**Anudatta** is the term for a low tone on a vowel.\n\n**Svarita** is the term for a falling or mixed tone on a vowel.\n\n**Anusvara** is the term for a nasalization sound that precedes a consonant.\n\n**Visarga** is the term for a soft breathing sound that precedes a vowel.\n\n> Note: In modern Indic languages, the terms _anusvara_ and _visarga_\n> often refer to diacritical marks that have the above effects on\n> pronunciation. In the Vedic Sanskrit language, however, they are\n> generally considered independent letters.\n\n## Glyph classification ##\n\nFor most codepoints, the `General Category` property defined in the Unicode\nstandard is correct, but it is not sufficient to fully capture the\nexpected shaping behavior (such as how the character is treated during\nglyph reordering). Therefore, they must additionally be classified by\nhow they are treated when shaping a run of text.\n\n\n### Vedic Extensions character table ###\n\n\nVedic Extension glyphs should be classified as in the following\ntable. Codepoints with no assigned meaning are\nmarked as _unassigned_ in the _Unicode category_ column. \n\nAssigned codepoints marked with a _null_ in the _Shaping class_\ncolumn evoke no special behavior from the shaping engine. \n\nThe _Mark-placement subclass_ column indicates mark-placement\npositioning. Assigned codepoints marked with a\n_null_ in this column evoke no special mark-placement behavior. Marks\ntagged with [Mn] in the _Unicode category_ column are categorized as\nnon-spacing; marks tagged with [Mc] are categorized as\nspacing-combining.\n\nSome codepoints in the following table use a _Shaping class_ that\ndiffers from the codepoint's Unicode _General Category_. The _Shaping\nclass_ takes precedence during OpenType shaping, as it captures more\nspecific behavior.\n\n\n:::{table} Vedic Extensions character table\n\n| Codepoint | Unicode category | Shaping class     | Mark-placement subclass    | Glyph                        |\n|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|\n|`U+1CD0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD0; Tone Karshana       |\n|`U+1CD1`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD1; Tone Shara          |\n|`U+1CD2`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CD2; Tone Prenkha        |\n|`U+1CD3`   | Punctuation      | _null_            | _null_                     | &#x1CD3; Sign Nihshvasa      |\n|`U+1CD4`   | Mark [Mn]        | CANTILLATION      | OVERSTRUCK                 | &#x1CD4; Tone Midline Svarita |\n|`U+1CD5`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD5; Tone Aggravated Independent Svarita |\n|`U+1CD6`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD6; Tone Independent Svarita |\n|`U+1CD7`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD7; Tone Kathaka Independent Svarita |\n|`U+1CD8`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD8; Tone Candra Below   |\n|`U+1CD9`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CD9; Tone Kathaka Independent Svarita Schroeder |\n|`U+1CDA`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDA; Tone Double Svarita |\n|`U+1CDB`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CDB; Tone Triple Svarita |\n|`U+1CDC`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDC; Tone Kathaka Anudatta |\n|`U+1CDD`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDD; Tone Dot Below      |\n|`U+1CDE`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDE; Tone Two Dots Below |\n|`U+1CDF`   | Mark [Mn]        | CANTILLATION      | BOTTOM_POSITION            | &#x1CDF; Tone Three Dots Below |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CE0`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CE0; Tone Rigvedic Kashmiri Independent Svarita |\n|`U+1CE1`   | Mark [Mc]        | CANTILLATION      | RIGHT_POSITION             | &#x1CE1; Tone Atharavedic Independent Svarita |\n|`U+1CE2`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE2; Sign Visarga Svarita |\n|`U+1CE3`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE3; Sign Visarga Udatta |\n|`U+1CE4`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE4; Sign Reversed Visarga Udatta |\n|`U+1CE5`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE5; Sign Visarga Anudatta |\n|`U+1CE6`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE6; Sign Reversed Visarga Anudatta |\n|`U+1CE7`   | Mark [Mn]        | _null_            | OVERSTRUCK                 | &#x1CE7; Sign Visarga Udatta With Tail |\n|`U+1CE8`   | Mark [Mn]        | AVAGRAHA          | OVERSTRUCK                 | &#x1CE8; Sign Visarga Anudatta With Tail |\n|`U+1CE9`   | Letter           | AVAGRAHA          | _null_                     | &#x1CE9; Sign Anusvara Antargomukha |\n|`U+1CEA`   | Letter           | _null_            | _null_                     | &#x1CEA; Sign Anusvara Bahirgomukha |\n|`U+1CEB`   | Letter           | _null_            | _null_                     | &#x1CEB; Sign Anusvara Vamagomukha |\n|`U+1CEC`   | Letter           | AVAGRAHA          | _null_                     | &#x1CEC; Sign Anusvara Vamagomukha With Tail |\n|`U+1CED`   | Mark [Mn]        | AVAGRAHA          | BOTTOM_POSITION            | &#x1CED; Sign Tiryak         |\n|`U+1CEE`   | Letter           | AVAGRAHA          | _null_                     | &#x1CEE; Sign Hexiform Long Anusvara |\n|`U+1CEF`   | Letter           | _null_            | _null_                     | &#x1CEF; Sign Long Anusvara  |\n| | | | |\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n|`U+1CF0`   | Letter           | _null_            | _null_                     | &#x1CF0; Sign Rthang Long Anusvara |\n|`U+1CF1`   | Letter           | AVAGRAHA          | _null_                     | &#x1CF1; Sign Anusvara Ubhayato Mukha |\n|`U+1CF2`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF2; Sign Ardhavisarga   |\n|`U+1CF3`   | Letter           | CONSONANT_DEAD    | _null_                     | &#x1CF3; Sign Rotated Ardhavisarga |\n|`U+1CF4`   | Mark [Mn]        | CANTILLATION      | TOP_POSITION               | &#x1CF4; Tone Candra Above   |\n|`U+1CF5`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF5; Sign Jihvamuliya    |\n|`U+1CF6`   | Letter           | CONSONANT_WITH_STACKER | _null_                | &#x1CF6; Sign Upadhmaniya    |\n|`U+1CF7`   | Mark [Mc]        | _null_            | _null_                     | &#x1CF7; Sign Atikrama       |\n|`U+1CF8`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF8; Tone Ring Above     |\n|`U+1CF9`   | Mark [Mn]        | CANTILLATION      | _null_                     | &#x1CF9; Tone Double Ring Above |\n|`U+1CFA`   | Letter           | PLACEHOLDER       | _null_                     | &#x1CFA; Sign Double Anusvara Antargomukha |\n|`U+1CFB`   | _unassigned_     |                   |                            |                              |\n|`U+1CFC`   | _unassigned_     |                   |                            |                              |\n|`U+1CFD`   | _unassigned_     |                   |                            |                              |\n|`U+1CFE`   | _unassigned_     |                   |                            |                              |\n|`U+1CFF`   | _unassigned_     |                   |                            |                              |\n:::\n\n\n## Shaping information ##\n\n31 of the characters in the block are categorized as marks. 27 of\nthese marks are subcategorized as non-spacing; the remaining four are\nspacing-combining. \n\nOf the non-spacing marks, 20 are classified as `CANTILLATION` (or tone-marker)\nindicators, which modify the pitch of vowels. Most of these marks are\ngenerally positioned above or below the main character, using <abbr title=\"Glyph Positioning table\">GPOS</abbr>\nmark attachment, in a position that does not interact or interfere\nwith the main character. In Unicode, the `CANTILLATION` classification\nis separate from the `TONE_MARKER` classification used in some scripts\nfor semantic reasons; the two classifications are identical for\nshaping purposes.\n\nSome of the marks (cantillation and non-cantillation) are classified\nas `OVERSTRUCK` in the _Mark-placement subclass_ column.\nThis indicates that the mark is intended to be rendered on top of the\npreceding character. During reordering, `OVERSTRUCK` marks are tagged\nfor the ordering position `POS_AFTER_MAIN`.\n\nSome marks are classified, for shaping purposes, as `AVAGRAHA` or\n`VISARGA`. This indicates that the mark behaves more like the Avagraha\nor Visarga character than like a diacritic.\n\nCharacters that are categorized in Unicode as letters vary with\nrespect to whether or not they trigger special behavior in the shaping\nprocess. These include letters that are classified as `CONSONANT` and\nletters that are classified as `AVAGRAHA`.\n\n\n\n<!--- 1cf5 and 1cf6 get reclassified as CONSONANT\n\n1ce2 and 1ce8 get treated like tone marks, but SHOULD be allowed only after Visarga.\n\n1ced gets treated like tone mark, but SHOULD be allowed only after U+1CE9..U+1CF1\n\n1ce9 1cec 1cee 1cf1 all take marks in standalone clusters, similar to Avagraha.\n--->\n"
  },
  {
    "path": "overview.md",
    "content": "```{include} /index.md\n```\n"
  },
  {
    "path": "test/spellcheck.yml",
    "content": "matrix:\n- name: Main\n  aspell:\n    lang: en\n  dictionary:\n    encoding: utf-8\n    wordlists:\n    - 'test/wordlist.txt'\n  pipeline:\n  - pyspelling.filters.markdown:\n      markdown_extensions:\n      - markdown.extensions.tables\n  - pyspelling.filters.html:\n      comments: false\n      attributes:\n      - title\n      - alt\n      ignores:\n      - code\n      - pre\n      - samp\n      - abbr\n      - :matches(.skip_spellcheck)\n  sources:\n  - '!character-tables/**/*.md|!images/**/*.md|**/*.md'\n  default_encoding: utf-8\n- name: Chartables\n  aspell:\n    lang: en\n  dictionary:\n    encoding: utf-8\n    wordlists:\n    - 'test/wordlist.txt'\n  pipeline:\n  - pyspelling.filters.markdown:\n      markdown_extensions:\n      - markdown.extensions.tables\n  - pyspelling.filters.html:\n      comments: false\n      attributes:\n      - title\n      - alt\n      ignores:\n      - code\n      - pre\n      - samp\n      - abbr\n      - table\n      - :matches(.skip_spellcheck)\n  sources:\n  - 'character-tables/*.md'\n  default_encoding: utf-8\n"
  },
  {
    "path": "test/spellcheck_html.yml",
    "content": "matrix:\n- name: HTML\n  aspell:\n    lang: en\n  dictionary:\n    encoding: utf-8\n    wordlists:\n    - 'test/wordlist.txt'\n  pipeline:\n  - pyspelling.filters.html:\n      comments: false\n      attributes:\n      - title\n      - alt\n      ignores:\n      - code\n      - pre\n      - samp\n      - abbr\n      - :matches(.skip_spellcheck)\n  sources:\n  - '_build/html/*.html'\n  default_encoding: utf-8\n"
  },
  {
    "path": "test/wordlist.txt",
    "content": "AllSorts\nAFA\nAFD\nAFE\nAFF\nAMTRA\nBBE\nBCA\nBCB\nBCC\nBD\nBaseC\nBiDi\nBLWF\nBSO\nBV\nBézier\nBlobmoji\nCBD\nCBDT\nCBE\nCBF\nCCA\nCCB\nCDA\nCDB\nCDD\nCDE\nCDF\nCEA\nCEB\nCEC\nCED\nCEE\nCEF\nCEK\nCFA\nCFB\nCFD\nCFE\nCFF\nCGJ\nCHA\nCJK\nCLDR\nCMAbv\nCMBlw\nCN\nCOLR\nCOLRv\nCPAL\nCoreText\nDCA\nDCF\nDDA\nDDC\nDDD\nDDE\nDDF\nDDHA\nDF\nDHA\nEB\nEBC\nECB\nECD\nEmojiTwo\nFAF\nFAbv\nFB\nFBlw\nFC\nFPst\nFVS\nFinalC\nFirefoxEmoji\nFontForge\nFontTools\nGDEF\nGPOS\nGSUB\nHB\nHN\nHarfBuzz\nHfG\nIndependentVowel\nJNya\nJoyPixels\nKA\nKD\nKSsa\nLBase\nLCount\nLIndex\nLLA\nLRM\nLTR\nLV\nLVIndex\nLVT\nlookupListOffset\nMAbv\nMBlw\nMCM\nMPre\nMPst\nMacOS\nMultipleSub\nMultipleSubst\nN'Ko\nNFD\nNFKC\nNFKD\nNCount\nNKo\nNNA\nNnTta\nNUKTA\nNUM\nOpenmoji\nOpenType\nPOS\nPRE\nPUA\nREADME\nRGI\nRLM\nRTL\nRagel\nReddit\nSBase\nSCount\nSDL\nSHA\nSIndex\nSL\nSMAbv\nSMBlw\nSMVD\nSVG\nSegoe\nSkinTone\nTBase\nTCount\nTIndex\nTOCtree\nTOCtrees\nTRYo\nTTX\nUA\nUCD\nUCDM\nUGC\nUI\nUIPC\nUISC\nUJT\nUniscribe\nUniscribe's\nVAbv\nVBase\nVBlw\nVCount\nVIndex\nVM\nVMAbv\nVMBlw\nVMPre\nVMPst\nVPre\nVPst\nWJ\nYAML\nYesLogic\nbaseC\nbelowbaseC\nfeatureListOffset\nfeatureVariations\nfeatureVariationsOffset\nscriptListOffset\nxA\nxAA\nxFB\nABOVEBASE\nAFAICT\nAKHAND\nALAPH\nANUSVARA\nAVAGRAHA\nAa\nAaa\nAbaric\nAdak\nAddak\nAhsda\nAi\nAira\nAiton\nAkhand\nAlaph\nAlef\nAntargomukha\nAnudatta\nAnusvara\nAq\nArdhavisarga\nAsat\nAtharavedic\nAtikrama\nAtthacan\nAvagraha\nAvestan\nAyin\nBEH\nBELOWBASE\nBHA\nBINDU\nBahirgomukha\nBalti\nBaluda\nBambara\nBangla\nBantoc\nBaphala\nBatak\nBathamasat\nBcad\nBeh\nBeyyal\nBhasha\nBheth\nBidirectionality\nBindi\nBindu\nBrahmi\nBrahmic\nBuginese\nCandra\nCandrabindu\nCantillation\nCatawa\nCcc\nChakma\nCham\nChandrabindu\nChillu\nChoseong\nChoseongul\nCia\nCn\nCoeng\nDALATH\nDagalga\nDagesh\nDalath\nDamma\nDammatan\nDotless\nDyula\nDzongkha\nEe\nEk\nEsṭrangēlā\nEthiopic\nEtnahta\nFatha\nFathatan\nGEMINATION\nGali\nGarshuni\nGemination\nGmünd\nGondi\nGrantha\nGurmukhi\nGurmukhi's\nHalant\nHalants\nHamza\nHanja\nHataf\nHathi\nHexiform\nHiriq\nHolam\nIe\nIjam\nIndependentVowel\nIri\nIrula\nIsan\nJa\nJamo\nJeongum\nJihvamuliya\nJudezmo\nJungseong\nKa\nKai\nKakabat\nKarshana\nKashida\nKashidas\nKashmiri\nKasra\nKasratan\nKathaka\nKayah\nKe\nKelantan\nKeycap\nKha\nKhamti\nKhanda\nKhmu\nKinzi\nKiyeok\nKo\nKrung\nKufi\nKutchi\nKuy\nLadakhi\nLanna\nLayar\nLepcha\nLetterlike\nLf\nLm\nLookahead\nMaayyaa\nMaddah\nMaitaikhu\nMaithili\nMajlīyānā\nManding\nManinka\nManipuri\nMaḏnḥāyā\nMc\nMidline\nModi\nMukha\nMx\nMynanmar\nNIKHAHIT\nNaskh\nNataliq\nNga\nNiggahita\nNihshvasa\nNikahit\nNikhahit\nNiqqud\nNoto\nNukta\nNya\nNç\nOVERSTRUCK\nOdia\nOe\nOo\nOverstruck\nPOSTBASE\nPREBASE\nPak\nPalaung\nPali\nPaniya\nPao\nPashto\nPatah\nPattani\nPeh\nPhags\nPhinthu\nPho\nPictographic\nPre\nPrecomposed\nPrenkha\nPwo\nQa\nQaa\nQaq\nRAKAR\nREPHA\nRISH\nRafe\nRakaar\nRakaaraansaya\nRaphala\nRbasa\nReahmuk\nRecomposition\nRepha\nRha\nRieul\nRigvedic\nRish\nRjes\nRnam\nRo\nRobat\nRr\nRra\nRsv\nRthang\nRumai\nRumi\nSHIFTER\nSTACKER\nSYM\nSamyok\nSannya\nSant\nSaurashtra\nSegol\nSerṭā\nSgaw\nShadda\nShan\nShara\nSheva\nShifter\nSibe\nSiddham\nSinhala\nSios\nSlv\nSlvt\nSoyombo\nSrog\nSsa\nSsangkiyeok\nStacker\nSu\nSukun\nSvarita\nSyāmē\nTHA\nTITLECASE\nTai\nTampuan\nTcomplex\nTham\nTifinagh\nTippi\nTiryak\nToandakhiat\nTodo\nTri\nTsheng\nTta\nTwemoji\nUbhayato\nUdaat\nUdatta\nUpadhmaniya\nUra\nUu\nVISARGA\nVamagomukha\nVattu\nVf\nViet\nViriam\nVirama\nVisarga\nVmain\nVpost\nWa\nXibe\nYYA\nYakash\nYansaya\nYaphala\nYod\nYya\nZanabazar\nZapf\nZsye\nZsym\naa\naab\nabvf\nabvm\nabvs\nadak\nadvertized\nakhn\nal\nalgorithmically\nanusvara\napf\nappled\narab\narabic\nartifically\nasat\nascender\nascenders\nbangjeom\nbaphala\nbarree\nbeng\nbengali\nbidirectionality\nbindi\nbindu\nbitmask\nblackflag\nblwf\nblwm\nblws\nbng\nbugfixes\ncalt\ncandrabindu\ncantillation\nccc\nccmp\nce\ncec\nced\ncee\ncek\ncfar\ncff\nchandrabindu\nchandrakkala\nchattawa\nchillu\nchoseong\ncjct\nclig\ncmap\ncodepoint\ncodepoints\ncodepoint's\ncoeng\ncoengs\ncompat\ncompatibilty\ncompatiblity\nconsonantmedial\nconsonantwithstacker\nconstitues\ncounterintuitive\ncswh\ndagesh\ndalath\ndamma\ndammatan\ndanda\ndecompositions\ndesignator\ndev\ndeva\ndevanagari\ndirectionality\ndlig\ndotless\ndottedcircle\neading\neg\nemojimodified\nencodings\nendtag\nfallbacks\nfamilymember\nfatha\nfathatan\nfe\nfeatureListOffset\nfeatureVariations\nfeatureVariationsOffset\nfi\nfina\nfitzpatrick\nfrac\nfuncs\nfvs\ngb\ngenderperson\ngendersign\ngermany\nghunna\ngjr\nglyf\nglyphs\ngrantha\ngrapheme\ngreyscale\ngujarati\ngujr\ngur\ngurmukhi\nhal\nhalant\nhalanta\nhalantamu\nhalants\nhaln\nhaming\nhamza\nhangul\nhardcoded\nhasanta\nhb\nhebr\nhebrew\nhgher\nhrasva\nhç\nignorable\nignorables\nijam\nimplementer\nimplementers\nindic\ninit\ninteresteed\ninterword\nintrepreted\nisol\njamo\njnya\njongseong\njungseong\njúç\njúð\nka\nkannada\nkar\nkashida\nkasra\nkern\nkerning\nkeycap\nkha\nkhmer\nkhmr\nkinzi\nkirīma\nkiyeok\nknd\nknda\nkssa\nlajanyalan\nlakuna\nlao\nliga\nljmo\nlocl\nlookahead\nlookups\nlv\nlvt\nmH\nmalayalam\nmatra\nmatras\nmatra's\nmatraabove\nmatrabelow\nmatrapost\nmatrapre\nmd\nmedi\nmh\nmis\nmkmk\nmlm\nmlym\nmodifer\nmong\nmongolian\nmonnga\nmonospaced\nmorx\nmr\nmset\nmultipersongroup\nmultiplesub\nmw\nmyanmar\nmym\nmymr\nnatively\nnbsp\nnga\nniggahita\nnikahit\nnikhahit\nniqqud\nnirugu\nnko\nnntta\nnonjoiner\nnotdef\nnukt\nnukta\noccurence\noccuring\nopentype\noriya\nory\norya\not\notc\notf\notl\noverline\nowels\npictographic\npng\npostprocess\npre\nprecomposed\npreprocess\npreprocessing\npstf\npsts\npua\npulli\npunc\npy\nra\nragel\nraphala\nrclt\nrecomposition\nrecompositions\nregionalindicator\nregistershifter\nreodering\nrepaya\nreph\nrepha\nrepositioned\nrepositions\nrish\nrkrf\nrlig\nrobat\nrphf\nrumi\nsala\nsamp\nsanitization\nsara\nsbix\nscriptListOffset\nshadda\nshaper\nshapers\nshaper's\nshifter\nshifters\nsinf\nsinh\nsinhala\nsios\nskintone\nsm\nsomefilename\nsomefontfilename\nspl\nsrong\nstch\nstr\nsubcategorized\nsubclasses\nsubclass's\nsubclassified\nsubsequence\nsubsequences\nsubst\nsubtag\nsubtags\nsvg\nswara\nsyllablemodifier\nsyrc\nsyriac\ntagchar\ntamil\ntaml\ntargetting\ntashdid\ntatweel\ntelu\ntelugu\nth\nthai\nthailao\nthr\ntibetan\ntibt\ntjmo\ntml\ntnum\ntra\ntsek\ntsheng\nttc\nttf\nttx\nun\nuncategorized\nunicode\nunicodes\nunioned\nuniscribe\nunneccessary\nva\nvaration\nvattu\nvatu\nvedic\nvedicsign\nvirama\nvisarga\nvisiblity\nvjmo\nvrt\nwaw\nyaphala\nyeh\nyya\nzah\nzwj\nzwnj\nḥarakah\n"
  }
]