Repository: n8willis/opentype-shaping-documents
Branch: master
Commit: 8b6e9924c4ad
Files: 132
Total size: 2.5 MB
Directory structure:
gitextract_xxlse0rw/
├── .github/
│ └── workflows/
│ ├── build_html_output.yml
│ └── test_document_sources.yml
├── .gitignore
├── BUILD.md
├── Makefile
├── README.md
├── _ext/
│ ├── LICENSE.md
│ ├── README.md
│ ├── abbreviations.py
│ ├── colors.py
│ └── shapingdocs_svg_color_toggles.py
├── _global.md
├── _static/
│ ├── LICENSES_FOR_INCORPORATED_SOFTWARE.txt
│ ├── custom.css
│ ├── fonts/
│ │ ├── Source_Code_Pro/
│ │ │ ├── OFL.txt
│ │ │ └── README.txt
│ │ ├── Source_Sans_3/
│ │ │ ├── OFL.txt
│ │ │ └── README.txt
│ │ └── Source_Serif_4/
│ │ ├── OFL.txt
│ │ └── README.txt
│ ├── fontsizes.html
│ └── toggleSvgColors.js
├── _templates/
│ ├── layout.html
│ └── static_nav.html
├── _toc.yml
├── build-requirements.txt
├── character-tables/
│ ├── README.md
│ ├── character-tables-arabic.md
│ ├── character-tables-bengali.md
│ ├── character-tables-devanagari.md
│ ├── character-tables-gujarati.md
│ ├── character-tables-gurmukhi.md
│ ├── character-tables-hangul.md
│ ├── character-tables-hebrew.md
│ ├── character-tables-kannada.md
│ ├── character-tables-khmer.md
│ ├── character-tables-lao.md
│ ├── character-tables-malayalam.md
│ ├── character-tables-mongolian.md
│ ├── character-tables-myanmar.md
│ ├── character-tables-nko.md
│ ├── character-tables-oriya.md
│ ├── character-tables-sinhala.md
│ ├── character-tables-syriac.md
│ ├── character-tables-tamil.md
│ ├── character-tables-telugu.md
│ ├── character-tables-thai.md
│ ├── character-tables-tibetan.md
│ └── index.md
├── conf.py
├── errata.md
├── images/
│ ├── arabic/
│ │ ├── arabic-png-image-generation-log.md
│ │ └── arabic-svg-image-generation-log.md
│ ├── bengali/
│ │ ├── bengali-png-image-generation-log.md
│ │ └── bengali-svg-image-generation-log.md
│ ├── devanagari/
│ │ ├── devanagari-png-image-generation-log.md
│ │ └── devanagari-svg-image-generation-log.md
│ ├── emoji/
│ │ └── emoji-png-image-generation-log.md
│ ├── example-fonts.txt
│ ├── gujarati/
│ │ ├── gujarati-png-image-generation-log.md
│ │ └── gujarati-svg-image-generation-log.md
│ ├── gurmukhi/
│ │ ├── gurmukhi-png-image-generation-log.md
│ │ └── gurmukhi-svg-image-generation-log.md
│ ├── hangul/
│ │ ├── hangul-png-image-generation-log.md
│ │ └── hangul-svg-image-generation-log.md
│ ├── hebrew/
│ │ ├── hebrew-png-image-generation-log.md
│ │ └── hebrew-svg-image-generation-log.md
│ ├── images-index.md
│ ├── kannada/
│ │ ├── kannada-png-image-generation-log.md
│ │ └── kannada-svg-image-generation-log.md
│ ├── khmer/
│ │ ├── khmer-png-image-generation-log.md
│ │ └── khmer-svg-image-generation-log.md
│ ├── malayalam/
│ │ ├── malayalam-png-image-generation-log.md
│ │ └── malayalam-svg-image-generation-log.md
│ ├── mongolian/
│ │ ├── mongolian-png-image-generation-log.md
│ │ └── mongolian-svg-image-generation-log.md
│ ├── myanmar/
│ │ ├── myanmar-png-image-generation-log.md
│ │ └── myanmar-svg-image-generation-log.md
│ ├── nko/
│ │ ├── nko-png-image-generation-log.md
│ │ └── nko-svg-image-generation-log.md
│ ├── oriya/
│ │ ├── oriya-png-image-generation-log.md
│ │ └── oriya-svg-image-generation-log.md
│ ├── sinhala/
│ │ ├── sinhala-png-image-generation-log.md
│ │ └── sinhala-svg-image-generation-log.md
│ ├── syriac/
│ │ ├── syriac-png-image-generation-log.md
│ │ └── syriac-svg-image-generation-log.md
│ ├── tamil/
│ │ ├── tamil-png-image-generation-log.md
│ │ └── tamil-svg-image-generation-log.md
│ ├── telugu/
│ │ ├── telugu-png-image-generation-log.md
│ │ └── telugu-svg-image-generation-log.md
│ ├── thai-lao/
│ │ ├── thai-lao-png-image-generation-log.md
│ │ └── thai-lao-svg-image-generation-log.md
│ └── tibetan/
│ ├── tibetan-png-image-generation-log.md
│ └── tibetan-svg-image-generation-log.md
├── index.md
├── make.bat
├── notes/
│ ├── README.md
│ ├── emoji-implementation.md
│ ├── index.md
│ ├── ragel-machine-notation.md
│ └── uniscribe-bug-compatibility.md
├── opentype-shaping-arabic-general.md
├── opentype-shaping-arabic.md
├── opentype-shaping-bengali.md
├── opentype-shaping-default.md
├── opentype-shaping-devanagari.md
├── opentype-shaping-emoji.md
├── opentype-shaping-gujarati.md
├── opentype-shaping-gurmukhi.md
├── opentype-shaping-hangul.md
├── opentype-shaping-hebrew.md
├── opentype-shaping-indic-general.md
├── opentype-shaping-kannada.md
├── opentype-shaping-khmer.md
├── opentype-shaping-malayalam.md
├── opentype-shaping-mongolian.md
├── opentype-shaping-myanmar.md
├── opentype-shaping-nko.md
├── opentype-shaping-normalization.md
├── opentype-shaping-oriya.md
├── opentype-shaping-sinhala.md
├── opentype-shaping-syriac.md
├── opentype-shaping-tamil.md
├── opentype-shaping-telugu.md
├── opentype-shaping-thai-lao.md
├── opentype-shaping-tibetan.md
├── opentype-shaping-use.md
├── opentype-shaping-vedic-extensions.md
├── overview.md
└── test/
├── spellcheck.yml
├── spellcheck_html.yml
└── wordlist.txt
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/build_html_output.yml
================================================
name: "Build HTML output"
on:
- push
jobs:
html:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Python environment
uses: actions/setup-python@v6
with:
python-version: "3.10"
- name: Update pip
run: |
python -m pip install --upgrade pip
- name: Install dependencies
run: |
if [ -f build-requirements.txt ]; then pip install -r build-requirements.txt; fi
- name: Build HTML
uses: rickstaa/sphinx-action@master
with:
docs-folder: "."
- uses: actions/upload-artifact@v6
with:
name: ShapingDocumentsHTML
path: _build/html/
================================================
FILE: .github/workflows/test_document_sources.yml
================================================
name: "Test document sources"
on:
- push
jobs:
linkcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Python environment
uses: actions/setup-python@v6
with:
python-version: "3.10"
- name: Update pip
run: |
python -m pip install --upgrade pip
- name: Install dependencies
run: |
if [ -f build-requirements.txt ]; then pip install -r build-requirements.txt; fi
- name: Check links
uses: rickstaa/sphinx-action@master
with:
docs-folder: "."
build-command: "sphinx-build -M linkcheck . _build"
================================================
FILE: .gitignore
================================================
# Ignore general
*~
# Ignore Unicode PDFs & misc resources
reference/
_build/
# Ignore auto-generated binary spelling dictionary
dictionary.dic
================================================
FILE: BUILD.md
================================================
# Building a local copy of these documents #
A local, static-HTML version of these documents can be built with the
[Sphinx](https://www.sphinx-doc.org/) generator.
The Sphinx-related files in the repository are:
```
config.py
index.rst
make.bat
Makefile
```
plus the directories
```
_build/
_static/
_templates/
_ext/
```
## Sphinx ##
Sphinx is a Python-based utility that you will need to install on your
local machine. The official [installation
guide](https://www.sphinx-doc.org/en/master/usage/quickstart.html)
covers what is necessary for a variety of OSes and
environments. Perhaps the easiest approach is to install Sphinx in a
Python [virtual
environment](https://www.sphinx-doc.org/en/master/usage/installation.html#using-virtual-environments).
After installing Sphinx itself, you will also need to install the
[MyST-Parser](https://myst-parser.readthedocs.io/en/latest/) package,
which enables Sphinx to process Markdown files. Using the
virtual-environment installation method, you can keep both of these
packages contained for this project.
At the moment there are three other dependencies involved, all of which
are Sphinx-extension packages:
1. [Sphinx-multitoc-numbering](https://sphinx-multitoc-numbering.readthedocs.io/),
which is required to make Sphinx use a continuous numbering scheme
across the files
2. [sphinx External TOC](https://sphinx-external-toc.readthedocs.io/),
which is required to define the navigation sidebar declaratively
(see the [TOCtrees](#toctrees) subsection below for more info)
3. [sphinx inline svg](https://pypi.org/project/sphinx-inline-svg/),
which is used to implement the user-togglable cluster colors on the
illustrations of feature application.
but a full `build-requirements.txt` file is included in the repository
that lists the packages in the author's virtual environment. You
shouldn't _need_ to utilize it, since just installing Sphinx,
MyST-Parser, and the extensions ought to suffice, but it is there if
required.
The build also uses a custom Sphinx extension, named
`shapingdocs_svg_color_toggles`, to generate the HTML elements used to
do the color toggling. This extension is included in the `_ext/`
directory of this repository and is called from that location,
however, so you do not need to install it separately.
## Building HTML documents ##
With the Sphinx and MyST-Parser packages installed, go to the
top-level directory of this repository in a shell or
terminal. Building the HTML documents should only take two steps:
1. Run `make clean` to clear out all temporary files from previous
run. Do this every time.
2. Run `make html` to regenerate the HTML files. The output files will
be written to the `_build/html/` subdirectory.
## Test suite ##
The `test/` directory contains test elements. At the moment, all of
the tests are run manually, but as the kinks are worked out they may
be rolled into either Git hooks or GitHub Actions, so it is advisable
to start using them. Currently there are two tests available as
Makefile targets:
1. Run `make linktest` to run checks on all of the URLs found in the
documents.
2. Run `make spellcheck` to run spellchecking on the documents.
You can also run `make test` to run both tests in sequence.
The spell-checking is configured in `test/spellcheck.yaml`. It uses
the [PySpelling](https://facelessuser.github.io/pyspelling/) package
with the custom wordlist at `test/wordlist.txt`.
There are a few lingering peculiarities to PySpelling (most notably,
it supports excluding specified HTML elements, but cannot exclude
`
` elements because of a discrepancy between its built-in
Markdown converter and Sphinx, so at present the `character-tables/`
directory returns a great many false hits from the Unicode codepoint
names in the tables). The plan is to iron those out, then run both the
spell-checking and link-checking tests automatically on all pull
requests. When that is implemented, it will be documented here.
## Editing and bugfixing ##
The static-HTML version of the docs are a work-in-progress at the
moment, so please do poke around for problems and report any bugs.
Basic Sphinx configuration is done in the `config.py` file.
The HTML output documents are currently using the "Alabaster" theme,
which comes preinstalled. The Alabaster theme accepts several
configuration options which are also kept in `config.py`. Output
customization for the theme is also tweaked in the `custom.css` file
in the `_static/` subdirectory (just be sure you edit the one in
`_static/` itself; whenever the docs are rebuilt with `make`, that
file also gets copied into `_build/html/_static/`, so don't edit that
copy of the file since it gets overwritten).
To report a suspected typo or to suggest a general wording change,
please first synchronize your local repository with `git pull`. Then
do a `make clean`/`make html` as described in the section above.
### TOCtrees ###
Sphinx, by default, is hardcoded in a way that requires all documents
to be referenced in a separate, Sphinx-specific `toctree` structure,
after which the navigation sidebars (and other elements) are generated
on-the-fly at build-time by Sphinx itself.
The current documents are using a third-party extension that defines
the "TOCtree" in a declarative YAML file instead, to work around some
undesirable outputs -- mainly in the GitHub repository views -- that
Sphinx triggers with its on-the-fly `toctree` process.
But this approach isn't (yet?) perfect. Some files (namely this one,
`BUILD.md`, and the image-generation-log files) are manually excluded
from the build process so that they do not generate a flurry of
warning messages. That's deliberate, because the build instructions
and log files are metadata and aren't part of the final documentation
set itself.
================================================
FILE: Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
PYSPELLING = pyspelling
PYSPELLINGMARKDOWNCONF = test/spellcheck.yml
PYSPELLINGHTMLCONF = test/spellcheck_html.yml
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Run tests on links and spelling
test: linktest spellcheck
# Use Sphinx's built-in link checker
linktest:
@$(SPHINXBUILD) -M linkcheck "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
# Use PySpelling
spellcheck:
@$(PYSPELLING) -c "$(PYSPELLINGMARKDOWNCONF)"
# Use PySpelling
htmlspellcheck:
@$(PYSPELLING) -c "$(PYSPELLINGHTMLCONF)"
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: README.md
================================================
# OpenType shaping documents #
Sponsored by [YesLogic](https://yeslogic.com/)
__
## 🆆 🅰 🆁 🅽 🅸 🅽 🅶 ##
>
> This repository is an active WORK IN PROGRESS.
>
> NONE of the documents you currently see here are complete
> nor are they suitable for reference. PLEASE do not use
> them as a guide or as a general information source.
>
> As long as this warning text remains visible, the above
> holds true.
These documents are meant to provide a functional specification for
text shaping. The expectation is that an implementer of this
specification will be using fonts in the OpenType font format applied
to input text that complies with Unicode.
At present, we are seeking comments and bugfixes. Interested readers
and contributors can begin at the
- Indic Model ([general information](opentype-shaping-indic-general.md))
- Scripts covered: [Devanagari](opentype-shaping-devanagari.md), [Bengali](opentype-shaping-bengali.md), [Gujarati](opentype-shaping-gujarati.md), [Gurmukhi](opentype-shaping-gurmukhi.md), [Kannada](opentype-shaping-kannada.md), [Malayalam](opentype-shaping-malayalam.md),
[Oriya](opentype-shaping-oriya.md), [Tamil](opentype-shaping-tamil.md), [Telugu](opentype-shaping-telugu.md), [Sinhala](opentype-shaping-sinhala.md)
- Arabic Model ([general information](opentype-shaping-arabic-general.md)
- Scripts covered: [Arabic](opentype-shaping-arabic.md), [N'Ko](opentype-shaping-nko.md), [Syriac](opentype-shaping-syriac.md), [Mongolian](opentype-shaping-mongolian.md)
- [Hangul](opentype-shaping-hangul.md)
- [Hebrew](opentype-shaping-hebrew.md)
- [Khmer](opentype-shaping-khmer.md)
- [Myanmar](opentype-shaping-myanmar.md)
- [Thai and Lao](opentype-shaping-thai-lao.md)
- [Tibetan](opentype-shaping-tibetan.md)
- [Universal Shaping Engine (USE)](opentype-shaping-use.md)
- All complex scripts that are not handled by a dedicated
script-specific shaping model
- [Default](opentype-shaping-default.md)
- All non-complex scripts
- [Emoji](opentype-shaping-emoji.md)
- Emoji sequences do not constitute a separate shaping model,
but handling emoji sequences can incorporate many of the same
Opentype mechanisms and should not be overlooked
shaping documents and are encouraged to submit their feedback
on the text or images of any of the linked scripts. The documents are
organized by script; where there are multiple shaping models for a
particular script (including deprecated models), the various models are
all addressed in the same script-specific document.
The documents also include a description of
[normalization](opentype-shaping-normalization.md) in the OpenType
shaping context, which differs from Unicode normalization in several
respects.
Various [notes](notes/README.md) about the document set and the details
of its scope, limitations, and quirks are also provided.
Some [errata](errata.md) about the "upstream" specifications and
reference documents are noted separately.
In its final form, this repository will hold documentation describing
the shaping behavior used for layout of OpenType text. In particular,
it will focus on complex scripts.
In addition to the primary, per-script documents, implementers and
other interesteed readers are encouraged to check the
[character tables](character-tables/README.md) for correctness and to
examine the [image-generation logs](/images/README.md) to identify
issues seen in the inline images.
### References
These documents cite the following informative references:
1. The Microsoft [Script development
specifications](https://docs.microsoft.com/en-us/typography/script-development/standard),
which document the behaviors expected for OpenType Layout fonts and
provide guidance & examples for type designers. OpenType is a
registered trademark of Microsoft Corporation.
2. Related portions of the Microsoft OpenType specification, such as the
[OpenType Layout tag
registry](https://docs.microsoft.com/en-us/typography/opentype/spec/ttoreg)
and [OpenType Layout common table
formats](https://docs.microsoft.com/en-us/typography/opentype/spec/chapter2),
which list and define feature tags, script & language tags, and
other internals of compliant OpenType font binaries. OpenType is a
registered trademark of Microsoft Corporation.
3. The [HarfBuzz](https://github.com/harfbuzz/harfbuzz) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
4. The [AllSorts](https://github.com/yeslogic/allsorts) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
5. The [Unicode
Standard](http://www.unicode.org/standard/standard.html) and
related Unicode Consortium projects such as the [Unicode Character
Database](http://www.unicode.org/reports/tr44/), which defines
Unicode code points and formal character properties used in
shaping. Unicode and the Unicode Logo are registered trademarks of
Unicode, Inc. in the United States and other countries.
6. The YesLogic [text corpus](https://github.com/yeslogic/corpus),
which includes real-world text data for several Indic scripts,
scraped from Wikipedia, Reddit, and multiple online news
sources. This data is used to test shaping in AllSorts and Prince.
7. Known but unofficial information about other shaping-engine
projects. Primarily this includes tests and reproducible issues
found via [HarfBuzz](https://github.com/harfbuzz/harfbuzz), because
HarfBuzz intentionally aims to produce results that will 100% match
the output of Microsoft Uniscribe (not counting cases where
Uniscribe's output is known to be incorrect, of course).
> Note: occasionally, tests or issues documenting the behavior of
> Apple CoreText are also included, but CoreText compatibility is
> not an explicit goal for HarfBuzz.
================================================
FILE: _ext/LICENSE.md
================================================
# License for shapingdocs Sphinx extension software
Unless otherwise indicated, all code in this directory is licensed
under the two-clause BSD license below. This license does _not_ apply
to code or other files found in parent, sibling, and other directories
within this repository.
Copyright 2025 Nathan Willis.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: _ext/README.md
================================================
# Sphinx extensions and related build tools
This directory holds custom extensions used by the Sphinx builder for
this set of documents, and a few somewhat-related Python utilities.
The contents of this directory are licensed under the 2-Clause BSD
license. This is the license used by the Sphinx project, and was
chosen here in order to maximize compatibility.
See the accompanying [LICENSE](LICENSE.md) file.
Note that the license does **not** apply to this documentation project
as a whole, but only to the contents of this directory.
================================================
FILE: _ext/abbreviations.py
================================================
# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis
#
# SPDX-License-Identifier: BSD-2-Clause
"""Dictionary of the acronyms or abbreviations and corresponding full-text expansions used in the documents.
"""
# Dictionary mapping the set of acronyms and abbreviations
# that get wrapped in tags in the generated output
# to the corresponding full-expansion text for each.
#
# Perhaps obviously, the keys were used to identify and
# tag acronyms with in the document source. In some,
# but not all, cases the full expansions are also used.
ABBR_STRING_MAP = {
"AAT": "Apple Advanced Typography",
"AJT": "Arabic Joining Type",
"AMTRA": "Arabic Mark Transient Reordering Algorithm",
"Ccc": "Canonical Combining Class",
"CEK": "Combining Enclosing Keycap",
"CGJ": "Combining Grapheme Joiner",
"CLDR": "Common Locale Data Repository",
"CSS": "Cascading Style Sheets",
"GDEF": "Glyph Definition table",
"GPOS": "Glyph Positioning table",
"GSUB": "Glyph Substitution table",
"LRM": "Left-to-Right Mark",
"LTR": "Left-To-Right",
"MCM": "Modifier Combining Mark",
"NBSP": "No-Break Space",
"NFC": "Normalization Form C",
"NFD": "Normalization Form D",
"NFKC": "Normalization Form KC",
"NFKD": "Normalization Form KD",
"PNG": "Portable Network Graphics",
"PUA": "Private Use Area",
"RGI": "Recommended for General Interchange",
"RLM": "Right-to-Left Mark",
"RTL": "Right-To-Left",
"SHA": "Secure Hash Algorithm",
"SVG": "Scalable Vector Graphics",
"UCD": "Unicode Character Database",
"UCDM": "Unicode Character Decomposition Mapping",
"UGC": "Unicode General Category",
"UIPC": "Unicode Indic Positional Category",
"UISC": "Unicode Indic Syllabic Category",
"UJT": "Unicode Joining Type",
"URL": "Uniform Resource Locator",
"USE": "Universal Shaping Engine",
"ZWJ": "Zero-Width Joiner",
"ZWNJ": "Zero-Width Non Joiner",
}
================================================
FILE: _ext/colors.py
================================================
# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis
#
# SPDX-License-Identifier: BSD-2-Clause
"""The sequence of #RRGGBB colors used in the colorized SVG illustration images.
"""
# Defines the color sequence used to colorize clusters in the SVG illustration
# images.
#
# It is based on the G10 sequence employed by Plotly, as visible at:
# https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express
#
# This sequence is chosen because it is generally consistent in value and it
# does not include any greys.
COLOR_LIST = [ "#3366cc", "#dc3912", "#ff9900", "#109618", "#990099", "#0099c6", "#dd4477", "#66aa00", "#b82e2e", "#316395",]
================================================
FILE: _ext/shapingdocs_svg_color_toggles.py
================================================
# SPDX-FileCopyrightText: Copyright 2025 Nathan Willis
#
# SPDX-License-Identifier: BSD-2-Clause
"""Sphinx extension to attach a color-toggle button to specified SVG elements in documents.
This extension only affects the `html` builder.
"""
from __future__ import annotations
from docutils import nodes
from docutils.parsers.rst import directives
from sphinx.util import logging
from sphinx.application import Sphinx
from sphinx.util.docutils import SphinxDirective, SphinxTranslator, SphinxRole
#from sphinx.util.typing import ExtensionMetadata
from sphinx.writers.html import HTMLTranslator
# Directive to insert color-toggle button
#
# Example output:
#
#
#
#
class svg_color_toggle_node(nodes.General, nodes.Element):
"""SVG color-toggle node."""
pass
class SVGColorToggleButton(SphinxDirective):
"""A directive to insert a color-toggle switch for
an SVG element with the specified id."""
has_content = True
required_arguments = 1
optional_arguments = 0
def run(self) -> list[svg_color_toggle_node]:
# The sole argument is the CSS element id to build
# the button for
svg_element = self.arguments[0]
return [
svg_color_toggle_node(
target_id = svg_element,
button_id = "button-" + svg_element,
button_klass = "svg-color-toggle-button",
button_label = "Toggle cluster colors",
)
]
def visit_svg_color_toggle_node_html(translator: HTMLTranslator, node: svg_color_toggle_node) -> None:
"""Entry point of the SVG color-toggle node."""
html: str = ""
if node["target_id"]:
html += ''
translator.body.append(html)
def visit_svg_color_toggle_node_unsupported(translator: SphinxTranslator, node: svg_color_toggle_node) -> None:
"""Entry point of the ignored SVG color-toggle node."""
logger.warning(
f"SVG color-toggle {node['target_id']}: unsupported output format (node skipped)"
)
raise nodes.SkipNode
def setup(app: Sphinx): # The ExtensionMetadata stuff is not available in this version of Sphinx. Sphinx's own docs are pretty terrible about clarifying these matters....
#def setup(app: Sphinx) -> ExtensionMetadata:
app.add_node(
svg_color_toggle_node,
html=(visit_svg_color_toggle_node_html, depart_svg_color_toggle_node_html),
epub=(visit_svg_color_toggle_node_unsupported, None),
latex=(visit_svg_color_toggle_node_unsupported, None),
man=(visit_svg_color_toggle_node_unsupported, None),
texinfo=(visit_svg_color_toggle_node_unsupported, None),
text=(visit_svg_color_toggle_node_unsupported, None),
)
app.add_directive("svg-color-toggle-button", SVGColorToggleButton)
return {
'version': '0.1',
'parallel_read_safe': True,
'parallel_write_safe': True,
}
================================================
FILE: _global.md
================================================
```{role} togglebutton(raw)
:format: html
```
```{raw} html
```
================================================
FILE: _static/LICENSES_FOR_INCORPORATED_SOFTWARE.txt
================================================
This documentation set includes files originating from the following
upstream projects, which are each subject to their own individual
licenses and are copyrighted by their own respective authors.
These respective copyright statements and licenses are reproduced
below or, for those cases where the license is bundled as a separate
file, referenced by file name.
Sphinx
// https://www.sphinx-doc.org/
//
// :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
// :license: BSD, see LICENSE for details.
//
// https://github.com/sphinx-doc/sphinx/blob/master/LICENSE
//
// - License for Sphinx
// ==================
//
// Unless otherwise indicated, all code in the Sphinx project is licenced under the
// two clause BSD licence below.
//
// Copyright (c) 2007-2023 by the Sphinx team (see AUTHORS file).
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
//
// Licenses for incorporated software
// ==================================
//
// The included implementation of NumpyDocstring._parse_numpydoc_see_also_section
// was derived from code under the following license:
//
// -------------------------------------------------------------------------------
//
// Copyright (C) 2008 Stefan van der Walt , Pauli Virtanen
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in
// the documentation and/or other materials provided with the
// distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
// DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
// INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
// IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
//
// -------------------------------------------------------------------------------
./_build/html/_static/searchtools.js
./_build/html/_static/language_data.js
./_build/html/_static/sphinx_highlight.js
./_build/html/_static/basic.css:
./_build/html/_static/doctools.js
./_build/html/_static/documentation_options.js
./_build/html/_static/_sphinx_javascript_frameworks_compat.js
./_build/html/objects.inv
./_build/html/searchindex.js
Sphinx 'Alabaster" theme
// https://github.com/sphinx-doc/alabaster
// Copyright (c) 2020 Jeff Forcier.
//
// https://github.com/sphinx-doc/alabaster/blob/0.x/LICENSE
//
// - Based on original work copyright (c) 2011 Kenneth Reitz and
// copyright (c) 2010 Armin Ronacher.
//
// Some rights reserved.
//
// Redistribution and use in source and binary forms of the theme, with or
// without modification, are permitted provided that the following conditions
// are met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
//
// * The names of the contributors may not be used to endorse or
// promote products derived from this software without specific
// prior written permission.
//
// THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
// ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
./_build/html/_static/alabaster.css
jQuery JavaScript Library v3.6.0
// https://jquery.com/
//
// Copyright OpenJS Foundation and other contributors
// Released under the MIT license
// https://jquery.org/license
//
// Date: 2021-03-02T17:08Z
//
// https://github.com/jquery/jquery/blob/3.6-stable/LICENSE.txt
//
// - Copyright OpenJS Foundation and other contributors, https://openjsf.org/
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
//
//
// Includes Sizzle.js
// https://sizzlejs.com/
// Sizzle CSS Selector Engine v2.3.6
// https://sizzlejs.com/
//
// Copyright JS Foundation and other contributors
// Released under the MIT license
// https://js.foundation/
//
// Date: 2021-02-16
//
// https://github.com/jquery/sizzle/blob/main/LICENSE.txt
//
// - Copyright JS Foundation and other contributors, https://js.foundation/
//
// This software consists of voluntary contributions made by many
// individuals. For exact contribution history, see the revision history
// available at https://github.com/jquery/sizzle
//
// The following license applies to all parts of this software except as
// documented below:
//
// ====
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
//
// ====
//
// All files located in the node_modules and external directories are
// externally maintained libraries used by this software which have their
// own licenses; we recommend you read them, as their terms may differ from
// the terms above.
./_build/html/_static/jquery-3.6.0.js: MIT License
./_build/html/_static/jquery.js
Pygments
// https://pygments.org/
//
// Copyright (c) 2006-2022 by the respective authors (see AUTHORS file).
// - https://github.com/pygments/pygments/blob/master/AUTHORS
//
// https://github.com/pygments/pygments/blob/master/LICENSE
//
// - Copyright (c) 2006-2022 by the respective authors (see AUTHORS file).
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
./_build/html/_static/pygments.css
Underscore.js 1.13.1
// https://underscorejs.org
//
// (c) 2009-2021 Jeremy Ashkenas, Julian Gonggrijp, and
// DocumentCloud and Investigative Reporters & Editors
// Underscore may be freely distributed under the MIT license.
//
// https://github.com/jashkenas/underscore/blob/master/LICENSE
//
// - Copyright (c) 2009-2022 Jeremy Ashkenas, Julian Gonggrijp,
// and DocumentCloud and Investigative Reporters & Editors
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
./_build/html/_static/underscore-1.13.1.js MIT License
./_build/html/_static/underscore.js MIT License
Source Code Pro
// https://github.com/adobe-fonts/source-code-pro
//
// Copyright 2010, 2012 Adobe Systems Incorporated
// (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights
// Reserved. Source is a trademark of Adobe Systems Incorporated in the
// United States and/or other countries.
//
// This Font Software is licensed under the SIL Open Font License, Version 1.1.
./_build/html/_static/fonts/Source_Code_Pro/OFL.txt: SIL Open Font License 1.1
./_build/html/_static/fonts/Source_Code_Pro/README.txt
./_build/html/_static/fonts/Source_Code_Pro/SourceCodePro-Italic-VariableFont_wght.ttf
./_build/html/_static/fonts/Source_Code_Pro/SourceCodePro-VariableFont_wght.ttf
Source Sans 3
// https://github.com/adobe-fonts/source-sans
//
// Copyright 2010-2020 Adobe (http://www.adobe.com/), with
// Reserved Font Name 'Source'. All Rights Reserved. Source is a
// trademark of Adobe in the United States and/or other countries.
//
// This Font Software is licensed under the SIL Open Font License, Version 1.1.
./_build/html/_static/fonts/Source_Sans_3/OFL.txt: SIL Open Font License 1.1
./_build/html/_static/fonts/Source_Sans_3/README.txt
./_build/html/_static/fonts/Source_Sans_3/SourceSans3-Italic-VariableFont_wght.ttf
./_build/html/_static/fonts/Source_Sans_3/SourceSans3-VariableFont_wght.ttf
Source Serif 4
// https://github.com/adobe-fonts/source-serif
//
// https://github.com/adobe-fonts/source-serif/blob/release/LICENSE.md
//
// - Copyright 2014 - 2023 Adobe (http://www.adobe.com/), with
// Reserved Font Name ‘Source’. All Rights Reserved. Source is a
// trademark of Adobe in the United States and/or other countries.
//
// This Font Software is licensed under the SIL Open Font License, Version 1.1.
./_build/html/_static/fonts/Source_Serif_4/OFL.txt: SIL Open Font License 1.1
./_build/html/_static/fonts/Source_Serif_4/README.txt
./_build/html/_static/fonts/Source_Serif_4/SourceSerif4-Italic-VariableFont_opsz,wght.ttf
./_build/html/_static/fonts/Source_Serif_4/SourceSerif4-VariableFont_opsz,wght.ttf
================================================
FILE: _static/custom.css
================================================
/* basic.css | file:///home/nate/code/opentype-shaping-documents/_build/html/_static/basic.css */
div.body {
/* min-width: 360px; */
/* max-width: 800px; */
min-width: 460px;
max-width: 800px;
}
/* alabaster.css | file:///home/nate/code/opentype-shaping-documents/_build/html/_static/alabaster.css */
/*div.document {
/* width: 940px; */
/* width: 1040px;
} */
/* div.sphinxsidebar {
/* width: 220px; */
/* width: 240px;
} */
div.bodywrapper {
/* margin: 0 0 0 220px; */
margin: 0 0 0 300px;
}
/* Hanging section numbers, for slightly easier in-page navigation. */
/* Indent the body text by a fixed amount on the left,
then move section-numbers leftward by the same amount. */
div.body>section {
margin-left: 4rem;
}
span.section-number {
display: inline-block;
width: 3.5rem;
text-align: right;
margin-left: -4.5rem;
margin-right: .5rem;
}
/*
span.section-number::after {
content: " ";
white-space: pre;
}*/
/* Locally served fonts, to support more smartfont features */
@font-face {
font-family: 'Source Serif 4';
src: url('./fonts/Source_Serif_4/SourceSerif4-Italic-VariableFont_opsz,wght.ttf') format('truetype-variations');
font-style: italic;
font-weight: 1 999;
}
@font-face {
font-family: 'Source Serif 4';
src: url('./fonts/Source_Serif_4/SourceSerif4-VariableFont_opsz,wght.ttf') format('truetype-variations');
font-style: normal;
font-weight: 1 999;
}
@font-face {
font-family: 'Source Sans 3';
src: url('./fonts/Source_Sans_3/SourceSans3-Italic-VariableFont_wght.ttf') format('truetype-variations');
size-adjust: 94%; /* 47% of x-height originally.... */
font-style: italic;
font-weight: 1 999;
}
@font-face {
font-family: 'Source Sans 3';
src: url('./fonts/Source_Sans_3/SourceSans3-VariableFont_wght.ttf') format('truetype-variations');
size-adjust: 94%; /* 47% of x-height originally.... */
font-style: normal;
font-weight: 1 999;
}
@font-face {
font-family: 'Source Code Pro';
src: url('./fonts/Source_Code_Pro/SourceCodePro-Italic-VariableFont_wght.ttf') format('truetype-variations');
size-adjust: 92%; /* 46% of x-height originally.... */
font-style: italic;
font-weight: 1 999;
}
@font-face {
font-family: 'Source Code Pro';
src: url('./fonts/Source_Code_Pro/SourceCodePro-VariableFont_wght.ttf') format('truetype-variations');
size-adjust: 92%; /* 46% of x-height originally.... */
font-style: normal;
font-weight: 1 999;
}
/* Use oldstyle numerals in body text */
body {
font-feature-settings: "onum";
}
/* Tables */
/* alternating background colors like GitHub inline styling uses */
tr:nth-child(even) {background: #F0F0F4}
tr:nth-child(odd) {background: #FFF}
/* Less obtrusive headers */
th {
font-family: 'Source Sans 3';
font-size-adjust: 0.48;
font-weight: 600;
background: #F0F0F4;
font-feature-settings: "tnum", "lnum";
}
tr {
font-feature-settings: "tnum", "lnum";
}
/* Try to target the captions in the toctree sidebar */
p.caption[role="heading"] span.caption-text {
font-size: 120%
}
/* Try to target the list beneath the captions in the toctree sidebar */
p.caption[role="heading"] + ul {
text-indent: 0.6em;
}
/* Make acronyms small-caps, but not interactive tooltips */
/* The font-size adjustment here may be temproary; it depends on */
/* the smallcap height of the font eventually specified. */
abbr {
font-variant: all-small-caps;
font-size: larger;
font-weight: 375;
cursor: default;
border-bottom: none;
}
/* Style 'samp' elements used to indicate explicit sequences and */
/* input/output character references that must be exact. */
samp {
font-family: 'Source Sans 3';
font-size-adjust: 0.48;
font-weight: 500; /* slightly heavier only */
color: #558;
}
/* De-emphasize Sphinx's section numbering, so as to be less */
/* distracting. */
/* */
/* TODO: shift numbers into margin. */
.section-number {
font-size: 70%;
color: #888;
}
/* Style rules for using Source family fonts */
div.body h1,
div.body h2,
div.body h3,
div.body h4,
div.body h5,
div.body h6 {
font-weight: 500;
}
/* Make the site-title in the sidebar bolder and different since */
/* there is no project logo. */
h1.logo {
font-family: 'Source Sans 3';
font-weight: 800;
font-size: 32px;
border-bottom: none;
text-decoration: none;
}
/* Fix the Alabaster theme's default font-scaling since we are */
/* using a complete superfamily with consistent sizing. */
pre, tt, code {
font-size: 1.0em; /* Undo Alabaster setting, best balance for the 3 element types considering Alabaster's other style rules */
font-weight: 450;
font-size-adjust: 0.46;
font-feature-settings: "tnum", "lnum";
}
/* Define table captions */
caption {
caption-side: bottom;
padding-top: 6px;
padding-bottom: 6px;
color: #656565;
}
/* Set figcaptions to look like table captions */
figcaption {
padding-top: 6px;
/* padding-bottom: 6px; altering this to account for toggle-button placement*/
padding-bottom: 0px;
margin-bottom: -12px;
color: #656565;
}
figcaption p {
margin-top: 8.5px;
margin-bottom: 8.5px;
}
/* Slightly lighten bgcolor on pre/tt/code in the captions, since the fg text color is lighter */
caption pre,
caption span.pre,
caption span.tt,
caption span.code,
figcaption pre,
figcaption span.pre,
figcaption span.tt,
figcaption span.code {
background-color: #eff4f7; /* testing; needs to be lighter because text is grey */
}
/* Set maximum size of SVG illustrations */
/* */
/* Width is currently limited to 100% of */
/* the parent element; height is currently */
/* specified relative to the text size, */
/* for presumptive convenience. 18em is a */
/* trial-and-error value that seems to be */
/* reasonable, but is by no means science. */
svg.shaping-demo {
max-width: 100%;
max-height: 18em;
}
/* Toggleable SVG clusters */
/*
/* Greyscale */
/* dc: dotted-circle */
.shaping-demo.greyscale-svg .dc {
fill: #999999;
stroke-width: 0%;
}
/* arrow: right-arrow */
.shaping-demo.greyscale-svg .arrow {
fill: #666666;
stroke-width: 0%;
}
/* z: ZWJ and ZWNJ */
.shaping-demo.greyscale-svg .z {
fill: #999999;
stroke: #999999;
stroke-width: 1px;
}
/* c0: cluster 0 */
.shaping-demo.greyscale-svg .c0 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 1 */
.shaping-demo.greyscale-svg .c1 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 2 */
.shaping-demo.greyscale-svg .c2 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 3 */
.shaping-demo.greyscale-svg .c3 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 4 */
.shaping-demo.greyscale-svg .c4 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 5 */
.shaping-demo.greyscale-svg .c5 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 6 */
.shaping-demo.greyscale-svg .c6 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 7 */
.shaping-demo.greyscale-svg .c7 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 8 */
.shaping-demo.greyscale-svg .c8 {
fill: #000000;
stroke-width: 0%;
}
/* c0: cluster 9 */
.shaping-demo.greyscale-svg .c9 {
fill: #000000;
stroke-width: 0%;
}
/* Colorized */
/* dc: dotted-circle */
.shaping-demo.color-svg .dc {
fill: #999999;
stroke-width: 0%;
}
/* arrow: right-arrow */
.shaping-demo.color-svg .arrow {
fill: #666666;
stroke-width: 0%;
}
/* z: ZWJ and ZWNJ */
.shaping-demo.color-svg .z {
fill: #999999;
stroke: #999999;
stroke-width: 1px;
}
/* c0: cluster 0 */
.shaping-demo.color-svg .c0 {
fill: #3366cc;
stroke-width: 0%;
}
/* c0: cluster 1 */
.shaping-demo.color-svg .c1 {
fill: #dc3912;
stroke-width: 0%;
}
/* c0: cluster 2 */
.shaping-demo.color-svg .c2 {
fill: #ff9900;
stroke-width: 0%;
}
/* c0: cluster 3 */
.shaping-demo.color-svg .c3 {
fill: #109618;
stroke-width: 0%;
}
/* c0: cluster 4 */
.shaping-demo.color-svg .c4 {
fill: #990099;
stroke-width: 0%;
}
/* c0: cluster 5 */
.shaping-demo.color-svg .c5 {
fill: #0099c6;
stroke-width: 0%;
}
/* c0: cluster 6 */
.shaping-demo.color-svg .c6 {
fill: #dd4477;
stroke-width: 0%;
}
/* c0: cluster 7 */
.shaping-demo.color-svg .c7 {
fill: #66aa00;
stroke-width: 0%;
}
/* c0: cluster 8 */
.shaping-demo.color-svg .c8 {
fill: #b82e2e;
stroke-width: 0%;
}
/* c0: cluster 9 */
.shaping-demo.color-svg .c9 {
fill: #316395;
stroke-width: 0%;
}
button.svg-color-toggle-button {
display: block;
margin-left: auto;
margin-right: auto;
/* margin-top: -8.5px; This makes alignment overly complicated.... */
padding: 4px 16px 5px 16px;
font-size: small;
color: #999;
background-color: #fff0;
border: 1px solid;
border-color: #bbb;
border-radius: 3px;
}
blockquote {
margin-top: 17px;
}
/* Static navigation sidebar */
/* Turn off bullet point on heading items */
li.static-nav-heading {
list-style: "";
}
/* L1 */
li.toctree-l1.static-nav {
font-size: 120%;
}
/* L2 headings */
li.toctree-l2.static-nav {
font-size: 100%;
font-style: italic;
}
/* L2 page links */
li.toctree-l2.static-nav a {
font-size: 100%;
font-style: normal;
}
================================================
FILE: _static/fonts/Source_Code_Pro/OFL.txt
================================================
Copyright 2010, 2012 Adobe Systems Incorporated (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights Reserved. Source is a trademark of Adobe Systems Incorporated in the United States and/or other countries.
This Font Software is licensed under the SIL Open Font License, Version 1.1.
This license is copied below, and is also available with a FAQ at:
http://scripts.sil.org/OFL
-----------------------------------------------------------
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-----------------------------------------------------------
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide
development of collaborative font projects, to support the font creation
efforts of academic and linguistic communities, and to provide a free and
open framework in which fonts may be shared and improved in partnership
with others.
The OFL allows the licensed fonts to be used, studied, modified and
redistributed freely as long as they are not sold by themselves. The
fonts, including any derivative works, can be bundled, embedded,
redistributed and/or sold with any software provided that any reserved
names are not used by derivative works. The fonts and derivatives,
however, cannot be released under any other type of license. The
requirement for fonts to remain under this license does not apply
to any document created using the fonts or their derivatives.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this license and clearly marked as such. This may
include source files, build scripts and documentation.
"Reserved Font Name" refers to any names specified as such after the
copyright statement(s).
"Original Version" refers to the collection of Font Software components as
distributed by the Copyright Holder(s).
"Modified Version" refers to any derivative made by adding to, deleting,
or substituting -- in part or in whole -- any of the components of the
Original Version, by changing formats or by porting the Font Software to a
new environment.
"Author" refers to any designer, engineer, programmer, technical
writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Font Software, to use, study, copy, merge, embed, modify,
redistribute, and sell modified and unmodified copies of the Font
Software, subject to the following conditions:
1) Neither the Font Software nor any of its individual components,
in Original or Modified Versions, may be sold by itself.
2) Original or Modified Versions of the Font Software may be bundled,
redistributed and/or sold with any software, provided that each copy
contains the above copyright notice and this license. These can be
included either as stand-alone text files, human-readable headers or
in the appropriate machine-readable metadata fields within text or
binary files as long as those fields can be easily viewed by the user.
3) No Modified Version of the Font Software may use the Reserved Font
Name(s) unless explicit written permission is granted by the corresponding
Copyright Holder. This restriction only applies to the primary font name as
presented to the users.
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
Software shall not be used to promote, endorse or advertise any
Modified Version, except to acknowledge the contribution(s) of the
Copyright Holder(s) and the Author(s) or with their explicit written
permission.
5) The Font Software, modified or unmodified, in part or in whole,
must be distributed entirely under this license, and must not be
distributed under any other license. The requirement for fonts to
remain under this license does not apply to any document created
using the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.
================================================
FILE: _static/fonts/Source_Code_Pro/README.txt
================================================
Source Code Pro Variable Font
=============================
This download contains Source Code Pro as both variable fonts and static fonts.
Source Code Pro is a variable font with this axis:
wght
This means all the styles are contained in these files:
SourceCodePro-VariableFont_wght.ttf
SourceCodePro-Italic-VariableFont_wght.ttf
If your app fully supports variable fonts, you can now pick intermediate styles
that aren’t available as static fonts. Not all apps support variable fonts, and
in those cases you can use the static font files for Source Code Pro:
static/SourceCodePro-ExtraLight.ttf
static/SourceCodePro-Light.ttf
static/SourceCodePro-Regular.ttf
static/SourceCodePro-Medium.ttf
static/SourceCodePro-SemiBold.ttf
static/SourceCodePro-Bold.ttf
static/SourceCodePro-ExtraBold.ttf
static/SourceCodePro-Black.ttf
static/SourceCodePro-ExtraLightItalic.ttf
static/SourceCodePro-LightItalic.ttf
static/SourceCodePro-Italic.ttf
static/SourceCodePro-MediumItalic.ttf
static/SourceCodePro-SemiBoldItalic.ttf
static/SourceCodePro-BoldItalic.ttf
static/SourceCodePro-ExtraBoldItalic.ttf
static/SourceCodePro-BlackItalic.ttf
Get started
-----------
1. Install the font files you want to use
2. Use your app's font picker to view the font family and all the
available styles
Learn more about variable fonts
-------------------------------
https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts
https://variablefonts.typenetwork.com
https://medium.com/variable-fonts
In desktop apps
https://theblog.adobe.com/can-variable-fonts-illustrator-cc
https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts
Online
https://developers.google.com/fonts/docs/getting_started
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide
https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts
Installing fonts
MacOS: https://support.apple.com/en-us/HT201749
Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux
Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows
Android Apps
https://developers.google.com/fonts/docs/android
https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts
License
-------
Please read the full license text (OFL.txt) to understand the permissions,
restrictions and requirements for usage, redistribution, and modification.
You can use them in your products & projects – print or digital,
commercial or otherwise.
This isn't legal advice, please consider consulting a lawyer and see the full
license for all details.
================================================
FILE: _static/fonts/Source_Sans_3/OFL.txt
================================================
Copyright 2010-2020 Adobe (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights Reserved. Source is a trademark of Adobe in the United States and/or other countries.
This Font Software is licensed under the SIL Open Font License, Version 1.1.
This license is copied below, and is also available with a FAQ at:
http://scripts.sil.org/OFL
-----------------------------------------------------------
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-----------------------------------------------------------
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide
development of collaborative font projects, to support the font creation
efforts of academic and linguistic communities, and to provide a free and
open framework in which fonts may be shared and improved in partnership
with others.
The OFL allows the licensed fonts to be used, studied, modified and
redistributed freely as long as they are not sold by themselves. The
fonts, including any derivative works, can be bundled, embedded,
redistributed and/or sold with any software provided that any reserved
names are not used by derivative works. The fonts and derivatives,
however, cannot be released under any other type of license. The
requirement for fonts to remain under this license does not apply
to any document created using the fonts or their derivatives.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this license and clearly marked as such. This may
include source files, build scripts and documentation.
"Reserved Font Name" refers to any names specified as such after the
copyright statement(s).
"Original Version" refers to the collection of Font Software components as
distributed by the Copyright Holder(s).
"Modified Version" refers to any derivative made by adding to, deleting,
or substituting -- in part or in whole -- any of the components of the
Original Version, by changing formats or by porting the Font Software to a
new environment.
"Author" refers to any designer, engineer, programmer, technical
writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Font Software, to use, study, copy, merge, embed, modify,
redistribute, and sell modified and unmodified copies of the Font
Software, subject to the following conditions:
1) Neither the Font Software nor any of its individual components,
in Original or Modified Versions, may be sold by itself.
2) Original or Modified Versions of the Font Software may be bundled,
redistributed and/or sold with any software, provided that each copy
contains the above copyright notice and this license. These can be
included either as stand-alone text files, human-readable headers or
in the appropriate machine-readable metadata fields within text or
binary files as long as those fields can be easily viewed by the user.
3) No Modified Version of the Font Software may use the Reserved Font
Name(s) unless explicit written permission is granted by the corresponding
Copyright Holder. This restriction only applies to the primary font name as
presented to the users.
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
Software shall not be used to promote, endorse or advertise any
Modified Version, except to acknowledge the contribution(s) of the
Copyright Holder(s) and the Author(s) or with their explicit written
permission.
5) The Font Software, modified or unmodified, in part or in whole,
must be distributed entirely under this license, and must not be
distributed under any other license. The requirement for fonts to
remain under this license does not apply to any document created
using the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.
================================================
FILE: _static/fonts/Source_Sans_3/README.txt
================================================
Source Sans 3 Variable Font
===========================
This download contains Source Sans 3 as both variable fonts and static fonts.
Source Sans 3 is a variable font with this axis:
wght
This means all the styles are contained in these files:
SourceSans3-VariableFont_wght.ttf
SourceSans3-Italic-VariableFont_wght.ttf
If your app fully supports variable fonts, you can now pick intermediate styles
that aren’t available as static fonts. Not all apps support variable fonts, and
in those cases you can use the static font files for Source Sans 3:
static/SourceSans3-ExtraLight.ttf
static/SourceSans3-Light.ttf
static/SourceSans3-Regular.ttf
static/SourceSans3-Medium.ttf
static/SourceSans3-SemiBold.ttf
static/SourceSans3-Bold.ttf
static/SourceSans3-ExtraBold.ttf
static/SourceSans3-Black.ttf
static/SourceSans3-ExtraLightItalic.ttf
static/SourceSans3-LightItalic.ttf
static/SourceSans3-Italic.ttf
static/SourceSans3-MediumItalic.ttf
static/SourceSans3-SemiBoldItalic.ttf
static/SourceSans3-BoldItalic.ttf
static/SourceSans3-ExtraBoldItalic.ttf
static/SourceSans3-BlackItalic.ttf
Get started
-----------
1. Install the font files you want to use
2. Use your app's font picker to view the font family and all the
available styles
Learn more about variable fonts
-------------------------------
https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts
https://variablefonts.typenetwork.com
https://medium.com/variable-fonts
In desktop apps
https://theblog.adobe.com/can-variable-fonts-illustrator-cc
https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts
Online
https://developers.google.com/fonts/docs/getting_started
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide
https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts
Installing fonts
MacOS: https://support.apple.com/en-us/HT201749
Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux
Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows
Android Apps
https://developers.google.com/fonts/docs/android
https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts
License
-------
Please read the full license text (OFL.txt) to understand the permissions,
restrictions and requirements for usage, redistribution, and modification.
You can use them in your products & projects – print or digital,
commercial or otherwise.
This isn't legal advice, please consider consulting a lawyer and see the full
license for all details.
================================================
FILE: _static/fonts/Source_Serif_4/OFL.txt
================================================
This Font Software is licensed under the SIL Open Font License, Version 1.1.
This license is copied below, and is also available with a FAQ at:
http://scripts.sil.org/OFL
-----------------------------------------------------------
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-----------------------------------------------------------
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide
development of collaborative font projects, to support the font creation
efforts of academic and linguistic communities, and to provide a free and
open framework in which fonts may be shared and improved in partnership
with others.
The OFL allows the licensed fonts to be used, studied, modified and
redistributed freely as long as they are not sold by themselves. The
fonts, including any derivative works, can be bundled, embedded,
redistributed and/or sold with any software provided that any reserved
names are not used by derivative works. The fonts and derivatives,
however, cannot be released under any other type of license. The
requirement for fonts to remain under this license does not apply
to any document created using the fonts or their derivatives.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this license and clearly marked as such. This may
include source files, build scripts and documentation.
"Reserved Font Name" refers to any names specified as such after the
copyright statement(s).
"Original Version" refers to the collection of Font Software components as
distributed by the Copyright Holder(s).
"Modified Version" refers to any derivative made by adding to, deleting,
or substituting -- in part or in whole -- any of the components of the
Original Version, by changing formats or by porting the Font Software to a
new environment.
"Author" refers to any designer, engineer, programmer, technical
writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Font Software, to use, study, copy, merge, embed, modify,
redistribute, and sell modified and unmodified copies of the Font
Software, subject to the following conditions:
1) Neither the Font Software nor any of its individual components,
in Original or Modified Versions, may be sold by itself.
2) Original or Modified Versions of the Font Software may be bundled,
redistributed and/or sold with any software, provided that each copy
contains the above copyright notice and this license. These can be
included either as stand-alone text files, human-readable headers or
in the appropriate machine-readable metadata fields within text or
binary files as long as those fields can be easily viewed by the user.
3) No Modified Version of the Font Software may use the Reserved Font
Name(s) unless explicit written permission is granted by the corresponding
Copyright Holder. This restriction only applies to the primary font name as
presented to the users.
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
Software shall not be used to promote, endorse or advertise any
Modified Version, except to acknowledge the contribution(s) of the
Copyright Holder(s) and the Author(s) or with their explicit written
permission.
5) The Font Software, modified or unmodified, in part or in whole,
must be distributed entirely under this license, and must not be
distributed under any other license. The requirement for fonts to
remain under this license does not apply to any document created
using the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.
================================================
FILE: _static/fonts/Source_Serif_4/README.txt
================================================
Source Serif 4 Variable Font
============================
This download contains Source Serif 4 as both variable fonts and static fonts.
Source Serif 4 is a variable font with these axes:
opsz
wght
This means all the styles are contained in these files:
SourceSerif4-VariableFont_opsz,wght.ttf
SourceSerif4-Italic-VariableFont_opsz,wght.ttf
If your app fully supports variable fonts, you can now pick intermediate styles
that aren’t available as static fonts. Not all apps support variable fonts, and
in those cases you can use the static font files for Source Serif 4:
static/SourceSerif4/SourceSerif4-ExtraLight.ttf
static/SourceSerif4/SourceSerif4-Light.ttf
static/SourceSerif4/SourceSerif4-Regular.ttf
static/SourceSerif4/SourceSerif4-Medium.ttf
static/SourceSerif4/SourceSerif4-SemiBold.ttf
static/SourceSerif4/SourceSerif4-Bold.ttf
static/SourceSerif4/SourceSerif4-ExtraBold.ttf
static/SourceSerif4/SourceSerif4-Black.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraLight.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Light.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Regular.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Medium.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-SemiBold.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Bold.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraBold.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Black.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraLight.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Light.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Regular.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Medium.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-SemiBold.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Bold.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraBold.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Black.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraLight.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Light.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Regular.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Medium.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-SemiBold.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Bold.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraBold.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Black.ttf
static/SourceSerif4/SourceSerif4-ExtraLightItalic.ttf
static/SourceSerif4/SourceSerif4-LightItalic.ttf
static/SourceSerif4/SourceSerif4-Italic.ttf
static/SourceSerif4/SourceSerif4-MediumItalic.ttf
static/SourceSerif4/SourceSerif4-SemiBoldItalic.ttf
static/SourceSerif4/SourceSerif4-BoldItalic.ttf
static/SourceSerif4/SourceSerif4-ExtraBoldItalic.ttf
static/SourceSerif4/SourceSerif4-BlackItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraLightItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-LightItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-Italic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-MediumItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-SemiBoldItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-BoldItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-ExtraBoldItalic.ttf
static/SourceSerif4_18pt/SourceSerif4_18pt-BlackItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraLightItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-LightItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-Italic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-MediumItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-SemiBoldItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-BoldItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-ExtraBoldItalic.ttf
static/SourceSerif4_36pt/SourceSerif4_36pt-BlackItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraLightItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-LightItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-Italic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-MediumItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-SemiBoldItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-BoldItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-ExtraBoldItalic.ttf
static/SourceSerif4_48pt/SourceSerif4_48pt-BlackItalic.ttf
Get started
-----------
1. Install the font files you want to use
2. Use your app's font picker to view the font family and all the
available styles
Learn more about variable fonts
-------------------------------
https://developers.google.com/web/fundamentals/design-and-ux/typography/variable-fonts
https://variablefonts.typenetwork.com
https://medium.com/variable-fonts
In desktop apps
https://theblog.adobe.com/can-variable-fonts-illustrator-cc
https://helpx.adobe.com/nz/photoshop/using/fonts.html#variable_fonts
Online
https://developers.google.com/fonts/docs/getting_started
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Fonts/Variable_Fonts_Guide
https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/variable-fonts
Installing fonts
MacOS: https://support.apple.com/en-us/HT201749
Linux: https://www.google.com/search?q=how+to+install+a+font+on+gnu%2Blinux
Windows: https://support.microsoft.com/en-us/help/314960/how-to-install-or-remove-a-font-in-windows
Android Apps
https://developers.google.com/fonts/docs/android
https://developer.android.com/guide/topics/ui/look-and-feel/downloadable-fonts
License
-------
Please read the full license text (OFL.txt) to understand the permissions,
restrictions and requirements for usage, redistribution, and modification.
You can use them in your products & projects – print or digital,
commercial or otherwise.
This isn't legal advice, please consider consulting a lawyer and see the full
license for all details.
================================================
FILE: _static/fontsizes.html
================================================
Testing relative font sizes
Samples for comparing font-size-adjust and variable-axis
tweaks on the Source Superfamily
================================================
FILE: _toc.yml
================================================
root: index
options:
maxdepth: 0
numbered: False
hidden: True
titlesonly: True
entries:
- file: overview
title: Overview
subtrees:
- maxdepth: 0
numbered: 3
entries:
- file: opentype-shaping-indic-general
title: Indic general
- file: opentype-shaping-devanagari
title: Devanagari
- file: opentype-shaping-bengali
title: Bengali
- file: opentype-shaping-gujarati
title: Gujarati
- file: opentype-shaping-gurmukhi
title: Gurmukhi
- file: opentype-shaping-kannada
title: Kannada
- file: opentype-shaping-malayalam
title: Malayalam
- file: opentype-shaping-oriya
title: Oriya
- file: opentype-shaping-sinhala
title: Sinhala
- file: opentype-shaping-tamil
title: Tamil
- file: opentype-shaping-telugu
title: Telugu
- file: opentype-shaping-vedic-extensions
title: Vedic extensions
- file: opentype-shaping-arabic-general
title: Arabic general
- file: opentype-shaping-arabic
title: Arabic
- file: opentype-shaping-nko
title: N'Ko
- file: opentype-shaping-syriac
title: Syriac
- file: opentype-shaping-mongolian
title: Mongolian
- file: opentype-shaping-hangul
title: Hangul
- file: opentype-shaping-hebrew
title: Hebrew
- file: opentype-shaping-khmer
title: Khmer
- file: opentype-shaping-myanmar
title: Myanmar
- file: opentype-shaping-thai-lao
title: Thai and Lao
- file: opentype-shaping-tibetan
title: Tibetan
- file: opentype-shaping-use
title: Universal Shaping Engine
- file: opentype-shaping-default
title: Default scripts
- file: opentype-shaping-emoji
title: Emoji
- file: opentype-shaping-normalization
title: Normalization
- file: notes/index
title: Notes
subtrees:
- maxdepth: 0
numbered: False
entries:
- file: notes/uniscribe-bug-compatibility
title: Uniscribe compatibility
- file: notes/ragel-machine-notation
title: Ragel state-machine operators
- file: notes/emoji-implementation
title: Emoji implementation
- file: character-tables/index
title: Character tables
- file: character-tables/character-tables-arabic
title: Arabic
- file: character-tables/character-tables-bengali
title: Bengali
- file: character-tables/character-tables-devanagari
title: Devanagari
- file: character-tables/character-tables-gujarati
title: Gujarati
- file: character-tables/character-tables-gurmukhi
title: Gurmukhi
- file: character-tables/character-tables-hangul
title: Hangul
- file: character-tables/character-tables-hebrew
title: Hebrew
- file: character-tables/character-tables-kannada
title: Kannada
- file: character-tables/character-tables-khmer
title: Khmer
- file: character-tables/character-tables-lao
title: Lao
- file: character-tables/character-tables-malayalam
title: Malayalam
- file: character-tables/character-tables-mongolian
title: Mongolian
- file: character-tables/character-tables-myanmar
title: Myanmar
- file: character-tables/character-tables-nko
title: N'Ko
- file: character-tables/character-tables-oriya
title: Oriya
- file: character-tables/character-tables-sinhala
title: Sinhala
- file: character-tables/character-tables-syriac
title: Syriac
- file: character-tables/character-tables-tamil
title: Tamil
- file: character-tables/character-tables-telugu
title: Telugu
- file: character-tables/character-tables-thai
title: Thai
- file: character-tables/character-tables-tibetan
title: Tibetan
- file: errata
title: Errata
================================================
FILE: build-requirements.txt
================================================
alabaster==1.0.0
importlib-metadata>=5.0.0
myst-parser>=0.19.1
docutils==0.21.2
markdown-it-py==3.0.0
pip>=22.1.2
pyparsing>=3.0.9
pyspelling>=2.12.1
pytz>=2022.4
setuptools>=62.6.0
Sphinx==8.1.3
sphinx_external_toc>=1.1.0
sphinx-inline-svg>=0.2.0
sphinx-multitoc-numbering==0.1.3
svg-stack>=0.1.0
cloud-sptheme>=1.10.0
================================================
FILE: character-tables/README.md
================================================
# Character tables #
The files in this directory include per-srcipt reference tables
showing the shaping-related properties of the codepoints used for each
script.
- Indic
- [Devanagari](character-tables-devanagari.md)
- [Bengali](character-tables-bengali.md)
- [Gujarati](character-tables-gujarati.md)
- [Gurmukhi](character-tables-gurmukhi.md)
- [Kannada](character-tables-kannada.md)
- [Malayalam](character-tables-malayalam.md)
- [Oriya](character-tables-oriya.md)
- [Tamil](character-tables-tamil.md)
- [Telugu](character-tables-telugu.md)
- [Sinhala](character-tables-sinhala.md)
- _Vedic Extensions tables are included in each Indic script_
- Arabic
- [Arabic](character-tables-arabic.md)
- [Syriac](character-tables-syriac.md)
- [N'Ko](character-tables-nko.md)
- [Mongolian](character-tables-mongolian.md)
- Hangul
- [Hangul Jamo](character-tables-hangul.md)
- Hebrew
- [Hebrew](character-tables-hebrew.md)
- Khmer
- [Khmer](character-tables-khmer.md)
- Lao
- [Lao](character-tables-lao.md)
- Myanmar
- [Myanmar](character-tables-myanmar.md)
- Thai
- [Thai](character-tables-thai.md)
- Tibetan
- [Tibetan](character-tables-tibetan.md)
Tables are not provided for the default or Universal Shaping Engine
(USE) shaping documents, each of which covers a
multitude of individual scripts, nor for the emoji shaping document,
because emoji usage is not specific to any individual script.
================================================
FILE: character-tables/character-tables-arabic.md
================================================
# Arabic character tables #
This document lists the per-character shaping information needed to
[shape Arabic text](../opentype-shaping-arabic.md).
**Contents**
- [Arabic character table](#arabic-character-table)
- [Arabic Supplement character table](#arabic-supplement-character-table)
- [Arabic Extended-A character table](#arabic-extended-a-character-table)
- [Arabic Extended-B character table](#arabic-extended-b-character-table)
- [Arabic Extended-C character table](#arabic-extended-c-character-table)
- [Rumi Numeral Symbols character table](#rumi-numeral-symbols-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Arabic character table ##
Arabic glyphs should be classified as in the following
table. Codepoints in the Arabic block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
The _Joining type_ column indicates whether each codepoint is defined
as joining with adjacent characters on the left side, right side, left
and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints
designated TRANSPARENT in the _Joining type_ column do not join with
adjacent characters and, in addition, do not affect the joining
behavior of surrounding characters. Non-spacing marks are of type
TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent
characters to join.
The _Joining group_ column lists the fundamental letter that the
listed codepoint behaves like for joining purposes.
Assigned codepoints with a _null_ in the _Joining group_
column evoke no special behavior from the shaping engine during the
join-computation stage.
The _Mark class_ column indicates the Canonical Combining Class
for the codepoint. Marks are assigned non-zero combining classes so
that sequences of adjacent marks can be reordered as required by the
orthography.
For Arabic, a subset of marks in the 220 and 230 classes are also
designated _Modifier Combining Marks_ (MCM). These are denoted with
_220_MCM_ and _230_MCM_ in the _Mark class_ column. The MCM marks are
treated differently during the mark-reordering stage.
:::{table} Arabic block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+0600` | Other | NON_JOINING | _null_ | _0_ | Number Sign |
|`U+0601` | Other | NON_JOINING | _null_ | _0_ | Sign Sanah |
|`U+0602` | Other | NON_JOINING | _null_ | _0_ | Footnote Marker |
|`U+0603` | Other | NON_JOINING | _null_ | _0_ | Sign Safha |
|`U+0604` | Other | NON_JOINING | _null_ | _0_ | Sign Samvat |
|`U+0605` | Other | NON_JOINING | _null_ | _0_ | Number Mark Above |
|`U+0606` | Symbol | NON_JOINING | _null_ | _0_ | ؆ Cube Root |
|`U+0607` | Symbol | NON_JOINING | _null_ | _0_ | ؇ Fourth Root |
|`U+0608` | Symbol | NON_JOINING | _null_ | _0_ | ؈ Ray |
|`U+0609` | Punctuation | NON_JOINING | _null_ | _0_ | ؉ Per Mille |
|`U+060A` | Punctuation | NON_JOINING | _null_ | _0_ | ؊ Per Ten Thousand |
|`U+060B` | Symbol | NON_JOINING | _null_ | _0_ | ؋ Afghani Sign |
|`U+060C` | Punctuation | NON_JOINING | _null_ | _0_ | ، Comma |
|`U+060D` | Punctuation | NON_JOINING | _null_ | _0_ | ؍ Date Separator |
|`U+060E` | Symbol | NON_JOINING | _null_ | _0_ | ؎ Poetic Verse Sign |
|`U+060F` | Symbol | NON_JOINING | _null_ | _0_ | ؏ Sign Misra |
| | | | | |
|`U+0610` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؐ Sign Sallallahou Alayhe Wassallam |
|`U+0611` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؑ Sign Alayhe Assallam |
|`U+0612` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؒ Sign Rahmatullah Alayhe |
|`U+0613` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؓ Sign Radi Allahou Anhu |
|`U+0614` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؔ Sign Takhallus |
|`U+0615` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؕ Small High Tah |
|`U+0616` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؖ Small High Alef Lam Yeh |
|`U+0617` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ؗ Small High Zain |
|`U+0618` | Mark [Mn] | TRANSPARENT | _null_ | 30 | ؘ Small Fatha |
|`U+0619` | Mark [Mn] | TRANSPARENT | _null_ | 31 | ؙ Small Damma |
|`U+061A` | Mark [Mn] | TRANSPARENT | _null_ | 32 | ؚ Small Kasra |
|`U+061B` | Punctuation | NON_JOINING | _null_ | _0_ | ؛ Semicolon |
|`U+061C` | Other | TRANSPARENT | _null_ | _0_ | Arabic Letter Mark |
|`U+061D` | Punctuation | NON_JOINING | _null_ | _0_ | ؝ End Of Text Mark |
|`U+061E` | Punctuation | NON_JOINING | _null_ | _0_ | ؞ Triple Dot Punctuation Mark |
|`U+061F` | Punctuation | NON_JOINING | _null_ | _0_ | ؟ Question Mark |
| | | | | |
|`U+0620` | Letter | DUAL | YEH | _0_ | ؠ Kashmiri Yeh |
|`U+0621` | Letter | NON_JOINING | _null_ | _0_ | ء Hamza |
|`U+0622` | Letter | RIGHT | ALEF | _0_ | آ Alef With Madda Above |
|`U+0623` | Letter | RIGHT | ALEF | _0_ | أ Alef With Hamza Above |
|`U+0624` | Letter | RIGHT | WAW | _0_ | ؤ Waw With Hamza Above |
|`U+0625` | Letter | RIGHT | ALEF | _0_ | إ Alef With Hamza Below |
|`U+0626` | Letter | DUAL | YEH | _0_ | ئ Dotless Yeh With Hamza Above |
|`U+0627` | Letter | RIGHT | ALEF | _0_ | ا Alef |
|`U+0628` | Letter | DUAL | BEH | _0_ | ب Beh |
|`U+0629` | Letter | RIGHT | TEH_MARBUTA | _0_ | ة Teh Marbuta |
|`U+062A` | Letter | DUAL | BEH | _0_ | ت Dotless Beh With 2 Dots Above |
|`U+062B` | Letter | DUAL | BEH | _0_ | ث Dotless Beh With 3 Dots Above |
|`U+062C` | Letter | DUAL | HAH | _0_ | ج Hah With Dot Below |
|`U+062D` | Letter | DUAL | HAH | _0_ | ح Hah |
|`U+062E` | Letter | DUAL | HAH | _0_ | خ Hah With Dot Above |
|`U+062F` | Letter | RIGHT | DAL | _0_ | د Dal |
| | | | | |
|`U+0630` | Letter | RIGHT | DAL | _0_ | ذ Dal With Dot Above |
|`U+0631` | Letter | RIGHT | REH | _0_ | ر Reh |
|`U+0632` | Letter | RIGHT | REH | _0_ | ز Reh With Dot Above |
|`U+0633` | Letter | DUAL | SEEN | _0_ | س Seen |
|`U+0634` | Letter | DUAL | SEEN | _0_ | ش Seen With 3 Dots Above |
|`U+0635` | Letter | DUAL | SAD | _0_ | ص Sad |
|`U+0636` | Letter | DUAL | SAD | _0_ | ض Sad With Dot Above |
|`U+0637` | Letter | DUAL | TAH | _0_ | ط Tah |
|`U+0638` | Letter | DUAL | TAH | _0_ | ظ Tah With Dot Above |
|`U+0639` | Letter | DUAL | AIN | _0_ | ع Ain |
|`U+063A` | Letter | DUAL | AIN | _0_ | غ Ain With Dot Above |
|`U+063B` | Letter | DUAL | GAF | _0_ | ػ Keheh With 2 Dots Above |
|`U+063C` | Letter | DUAL | GAF | _0_ | ؼ Keheh With 3 Dots Below |
|`U+063D` | Letter | DUAL | FARSI_YEH | _0_ | ؽ Farsi Yeh With Inverted V Above |
|`U+063E` | Letter | DUAL | FARSI_YEH | _0_ | ؾ Farsi Yeh With 2 Dots Above |
|`U+063F` | Letter | DUAL | FARSI_YEH | _0_ | ؿ Farsi Yeh With 3 Dots Above |
| | | | | |
|`U+0640` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ـ Tatweel |
|`U+0641` | Letter | DUAL | FEH | _0_ | ف Feh |
|`U+0642` | Letter | DUAL | QAF | _0_ | ق Qaf |
|`U+0643` | Letter | DUAL | KAF | _0_ | ك Kaf |
|`U+0644` | Letter | DUAL | LAM | _0_ | ل Lam |
|`U+0645` | Letter | DUAL | MEEM | _0_ | م Meem |
|`U+0646` | Letter | DUAL | NOON | _0_ | ن Noon |
|`U+0647` | Letter | DUAL | HEH | _0_ | ه Heh |
|`U+0648` | Letter | RIGHT | WAW | _0_ | و Waw |
|`U+0649` | Letter | DUAL | YEH | _0_ | ى Dotless Yeh |
|`U+064A` | Letter | DUAL | YEH | _0_ | ي Yeh |
|`U+064B` | Mark [Mn] | TRANSPARENT | _null_ | 27 | ً Fathatan |
|`U+064C` | Mark [Mn] | TRANSPARENT | _null_ | 28 | ٌ Dammatan |
|`U+064D` | Mark [Mn] | TRANSPARENT | _null_ | 29 | ٍ Kasratan |
|`U+064E` | Mark [Mn] | TRANSPARENT | _null_ | 30 | َ Fatha |
|`U+064F` | Mark [Mn] | TRANSPARENT | _null_ | 31 | ُ Damma |
| | | | | |
|`U+0650` | Mark [Mn] | TRANSPARENT | _null_ | 32 | ِ Kasra |
|`U+0651` | Mark [Mn] | TRANSPARENT | _null_ | 33 | ّ Shadda |
|`U+0652` | Mark [Mn] | TRANSPARENT | _null_ | 34 | ْ Sukun |
|`U+0653` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٓ Maddah Above |
|`U+0654` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ٔ Hamza Above |
|`U+0655` | Mark [Mn] | TRANSPARENT | _null_ | 220_MCM | ٕ Hamza Below |
|`U+0656` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ٖ Subscript Alef |
|`U+0657` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٗ Inverted Damma |
|`U+0658` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ٘ Noon Ghunna |
|`U+0659` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٙ Zwarakay |
|`U+065A` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٚ Vowel Sign Small V Above |
|`U+065B` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٛ Vowel Sign Inverted Small V Above |
|`U+065C` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ٜ Vowel Sign Dot Below |
|`U+065D` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٝ Reversed Damma |
|`U+065E` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ٞ Fatha with Two Dots |
|`U+065F` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ٟ Wavy Hamza Below |
| | | | | |
|`U+0660` | Number | NON_JOINING | _null_ | _0_ | ٠ Digit Zero |
|`U+0661` | Number | NON_JOINING | _null_ | _0_ | ١ Digit One |
|`U+0662` | Number | NON_JOINING | _null_ | _0_ | ٢ Digit Two |
|`U+0663` | Number | NON_JOINING | _null_ | _0_ | ٣ Digit Three |
|`U+0664` | Number | NON_JOINING | _null_ | _0_ | ٤ Digit Four |
|`U+0665` | Number | NON_JOINING | _null_ | _0_ | ٥ Digit Five |
|`U+0666` | Number | NON_JOINING | _null_ | _0_ | ٦ Digit Six |
|`U+0667` | Number | NON_JOINING | _null_ | _0_ | ٧ Digit Seven |
|`U+0668` | Number | NON_JOINING | _null_ | _0_ | ٨ Digit Eight |
|`U+0669` | Number | NON_JOINING | _null_ | _0_ | ٩ Digit Nine |
|`U+066A` | Punctuation | NON_JOINING | _null_ | _0_ | ٪ Percent Sign |
|`U+066B` | Punctuation | NON_JOINING | _null_ | _0_ | ٫ Decimal Separator |
|`U+066C` | Punctuation | NON_JOINING | _null_ | _0_ | ٬ Thousands Separator |
|`U+066D` | Punctuation | NON_JOINING | _null_ | _0_ | ٭ Five Pointed Star |
|`U+066E` | Letter | DUAL | BEH | _0_ | ٮ Dotless Beh |
|`U+066F` | Letter | DUAL | QAF | _0_ | ٯ Dotless Qaf |
| | | | | |
|`U+0670` | Mark [Mn] | TRANSPARENT | _null_ | 35 | ٰ Superscript Alef |
|`U+0671` | Letter | RIGHT | ALEF | _0_ | ٱ Alef With Wasla Above |
|`U+0672` | Letter | RIGHT | ALEF | _0_ | ٲ Alef With Wavy Hamza Above |
|`U+0673` | Letter | RIGHT | ALEF | _0_ | ٳ Alef With Wavy Hamza Below |
|`U+0674` | Letter | NON_JOINING | _null_ | _0_ | ٴ High Hamza |
|`U+0675` | Letter | RIGHT | ALEF | _0_ | ٵ High Hamza Alef |
|`U+0676` | Letter | RIGHT | WAW | _0_ | ٶ High Hamza Waw |
|`U+0677` | Letter | RIGHT | WAW | _0_ | ٷ High Hamza Waw With Damma Above |
|`U+0678` | Letter | DUAL | YEH | _0_ | ٸ High Hamza Dotless Yeh |
|`U+0679` | Letter | DUAL | BEH | _0_ | ٹ Dotless Beh With Tah Above |
|`U+067A` | Letter | DUAL | BEH | _0_ | ٺ Dotless Beh With Vertical 2 Dots Above|
|`U+067B` | Letter | DUAL | BEH | _0_ | ٻ Dotless Beh With Vertical 2 Dots Below|
|`U+067C` | Letter | DUAL | BEH | _0_ | ټ Dotless Beh With Attached Ring Below And 2 Dots Above|
|`U+067D` | Letter | DUAL | BEH | _0_ | ٽ Dotless Beh With Inverted 3 Dots Above|
|`U+067E` | Letter | DUAL | BEH | _0_ | پ Dotless Beh With 3 Dots Below |
|`U+067F` | Letter | DUAL | BEH | _0_ | ٿ Dotless Beh With 4 Dots Above |
| | | | | |
|`U+0680` | Letter | DUAL | BEH | _0_ | ڀ Dotless Beh With 4 Dots Below |
|`U+0681` | Letter | DUAL | HAH | _0_ | ځ Hah With Hamza Above |
|`U+0682` | Letter | DUAL | HAH | _0_ | ڂ Hah With Vertical 2 Dots Above |
|`U+0683` | Letter | DUAL | HAH | _0_ | ڃ Hah With 2 Dots Below |
|`U+0684` | Letter | DUAL | HAH | _0_ | ڄ Hah With Vertical 2 Dots Below |
|`U+0685` | Letter | DUAL | HAH | _0_ | څ Hah With 3 Dots Above |
|`U+0686` | Letter | DUAL | HAH | _0_ | چ Hah With 3 Dots Below |
|`U+0687` | Letter | DUAL | HAH | _0_ | ڇ Hah With 4 Dots Below |
|`U+0688` | Letter | RIGHT | DAL | _0_ | ڈ Dal With Tah Above |
|`U+0689` | Letter | RIGHT | DAL | _0_ | ډ Dal With Attached Ring Below |
|`U+068A` | Letter | RIGHT | DAL | _0_ | ڊ Dal With Dot Below |
|`U+068B` | Letter | RIGHT | DAL | _0_ | ڋ Dal With Dot Below And Tah Above |
|`U+068C` | Letter | RIGHT | DAL | _0_ | ڌ Dal With 2 Dots Above |
|`U+068D` | Letter | RIGHT | DAL | _0_ | ڍ Dal With 2 Dots Below |
|`U+068E` | Letter | RIGHT | DAL | _0_ | ڎ Dal With 3 Dots Above |
|`U+068F` | Letter | RIGHT | DAL | _0_ | ڏ Dal With Inverted 3 Dots Above |
| | | | | |
|`U+0690` | Letter | RIGHT | DAL | _0_ | ڐ Dal With 4 Dots Above |
|`U+0691` | Letter | RIGHT | REH | _0_ | ڑ Reh With Tah Above |
|`U+0692` | Letter | RIGHT | REH | _0_ | ڒ Reh With V Above |
|`U+0693` | Letter | RIGHT | REH | _0_ | ړ Reh With Attached Ring Below |
|`U+0694` | Letter | RIGHT | REH | _0_ | ڔ Reh With Dot Below |
|`U+0695` | Letter | RIGHT | REH | _0_ | ڕ Reh With V Below |
|`U+0696` | Letter | RIGHT | REH | _0_ | ږ Reh With Dot Below And Dot Within |
|`U+0697` | Letter | RIGHT | REH | _0_ | ڗ Reh With 2 Dots Above |
|`U+0698` | Letter | RIGHT | REH | _0_ | ژ Reh With 3 Dots Above |
|`U+0699` | Letter | RIGHT | REH | _0_ | ڙ Reh With 4 Dots Above |
|`U+069A` | Letter | DUAL | SEEN | _0_ | ښ Seen With Dot Below And Dot Above |
|`U+069B` | Letter | DUAL | SEEN | _0_ | ڛ Seen With 3 Dots Below |
|`U+069C` | Letter | DUAL | SEEN | _0_ | ڜ Seen With 3 Dots Below And 3 Dots Above|
|`U+069D` | Letter | DUAL | SAD | _0_ | ڝ Sad With 2 Dots Below |
|`U+069E` | Letter | DUAL | SAD | _0_ | ڞ Sad With 3 Dots Above |
|`U+069F` | Letter | DUAL | TAH | _0_ | ڟ Tah With 3 Dots Above |
| | | | | |
|`U+06A0` | Letter | DUAL | AIN | _0_ | ڠ Ain With 3 Dots Above |
|`U+06A1` | Letter | DUAL | FEH | _0_ | ڡ Dotless Feh |
|`U+06A2` | Letter | DUAL | FEH | _0_ | ڢ Dotless Feh With Dot Below |
|`U+06A3` | Letter | DUAL | FEH | _0_ | ڣ Feh With Dot Below |
|`U+06A4` | Letter | DUAL | FEH | _0_ | ڤ Dotless Feh With 3 Dots Above |
|`U+06A5` | Letter | DUAL | FEH | _0_ | ڥ Dotless Feh With 3 Dots Below |
|`U+06A6` | Letter | DUAL | FEH | _0_ | ڦ Dotless Feh With 4 Dots Above |
|`U+06A7` | Letter | DUAL | QAF | _0_ | ڧ Dotless Qaf With Dot Above |
|`U+06A8` | Letter | DUAL | QAF | _0_ | ڨ Dotless Qaf With 3 Dots Above |
|`U+06A9` | Letter | DUAL | GAF | _0_ | ک Keheh |
|`U+06AA` | Letter | DUAL | SWASH_KAF | _0_ | ڪ Swash Kaf |
|`U+06AB` | Letter | DUAL | GAF | _0_ | ګ Keheh With Attached Ring Below |
|`U+06AC` | Letter | DUAL | KAF | _0_ | ڬ Kaf With Dot Above |
|`U+06AD` | Letter | DUAL | KAF | _0_ | ڭ Kaf With 3 Dots Above |
|`U+06AE` | Letter | DUAL | KAF | _0_ | ڮ Kaf With 3 Dots Below |
|`U+06AF` | Letter | DUAL | GAF | _0_ | گ Gaf |
| | | | | |
|`U+06B0` | Letter | DUAL | GAF | _0_ | ڰ Gaf With Attached Ring Below |
|`U+06B1` | Letter | DUAL | GAF | _0_ | ڱ Gaf With 2 Dots Above |
|`U+06B2` | Letter | DUAL | GAF | _0_ | ڲ Gaf With 2 Dots Below |
|`U+06B3` | Letter | DUAL | GAF | _0_ | ڳ Gaf With Vertical 2 Dots Below |
|`U+06B4` | Letter | DUAL | GAF | _0_ | ڴ Gaf With 3 Dots Above |
|`U+06B5` | Letter | DUAL | LAM | _0_ | ڵ Lam With V Above |
|`U+06B6` | Letter | DUAL | LAM | _0_ | ڶ Lam With Dot Above |
|`U+06B7` | Letter | DUAL | LAM | _0_ | ڷ Lam With 3 Dots Above |
|`U+06B8` | Letter | DUAL | LAM | _0_ | ڸ Lam With 3 Dots Below |
|`U+06B9` | Letter | DUAL | NOON | _0_ | ڹ Noon With Dot Below |
|`U+06BA` | Letter | DUAL | NOON | _0_ | ں Dotless Noon |
|`U+06BB` | Letter | DUAL | NOON | _0_ | ڻ Dotless Noon With Tah Above |
|`U+06BC` | Letter | DUAL | NOON | _0_ | ڼ Noon With Attached Ring Below |
|`U+06BD` | Letter | DUAL | NYA | _0_ | ڽ Nya |
|`U+06BE` | Letter | DUAL | KNOTTED_HEH | _0_ | ھ Knotted Heh |
|`U+06BF` | Letter | DUAL | HAH | _0_ | ڿ Hah With 3 Dots Below And Dot Above |
| | | | | |
|`U+06C0` | Letter | RIGHT | TEH_MARBUTA | _0_ | ۀ Dotless Teh Marbuta With Hamza Above |
|`U+06C1` | Letter | DUAL | HEH_GOAL | _0_ | ہ Heh Goal |
|`U+06C2` | Letter | DUAL | HEH_GOAL | _0_ | ۂ Heh Goal With Hamza Above |
|`U+06C3` | Letter | RIGHT | TEH_MARBUTA_GOAL | _0_ | ۃ Teh Marbuta Goal |
|`U+06C4` | Letter | RIGHT | WAW | _0_ | ۄ Waw With Attached Ring Within |
|`U+06C5` | Letter | RIGHT | WAW | _0_ | ۅ Waw With Bar |
|`U+06C6` | Letter | RIGHT | WAW | _0_ | ۆ Waw With V Above |
|`U+06C7` | Letter | RIGHT | WAW | _0_ | ۇ Waw With Damma Above |
|`U+06C8` | Letter | RIGHT | WAW | _0_ | ۈ Waw With Alef Above |
|`U+06C9` | Letter | RIGHT | WAW | _0_ | ۉ Waw With Inverted V Above |
|`U+06CA` | Letter | RIGHT | WAW | _0_ | ۊ Waw With 2 Dots Above |
|`U+06CB` | Letter | RIGHT | WAW | _0_ | ۋ Waw With 3 Dots Above |
|`U+06CC` | Letter | DUAL | FARSI_YEH | _0_ | ی Farsi Yeh |
|`U+06CD` | Letter | RIGHT | YEH_WITH_TAIL | _0_ | ۍ Yeh With Tail |
|`U+06CE` | Letter | DUAL | FARSI_YEH | _0_ | ێ Farsi Yeh With V Above |
|`U+06CF` | Letter | RIGHT | WAW | _0_ | ۏ Waw With Dot Above |
| | | | | |
|`U+06D0` | Letter | DUAL | YEH | _0_ | ې Dotless Yeh With Vertical 2 Dots Below|
|`U+06D1` | Letter | DUAL | YEH | _0_ | ۑ Dotless Yeh With 3 Dots Below |
|`U+06D2` | Letter | RIGHT | YEH_BARREE | _0_ | ے Yeh Barree |
|`U+06D3` | Letter | RIGHT | YEH_BARREE | _0_ | ۓ Yeh Barree With Hamza Above |
|`U+06D4` | Punctuation | NON_JOINING | _null_ | _0_ | ۔ Full Stop |
|`U+06D5` | Letter | NON_JOINING | TEH_MARBUTA | _0_ | ە Dotless Teh Marbuta |
|`U+06D6` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۖ Small High Sad Lam Alef Maksura |
|`U+06D7` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۗ Small High Qaf Lam Alef Maksura |
|`U+06D8` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۘ Small High Meem Initial Form |
|`U+06D9` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۙ Small High Lam Alef |
|`U+06DA` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۚ Small High Jeem |
|`U+06DB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۛ Small High Three Dots |
|`U+06DC` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ۜ Small High Seen |
|`U+06DD` | Other | NON_JOINING | _null_ | _0_ | End Of Ayah |
|`U+06DE` | Other | NON_JOINING | _null_ | _0_ | ۞ Start Of Rub El Hizb |
|`U+06DF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۟ Small High Rounded Zero |
| | | | | |
|`U+06E0` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۠ Small High Upright Rectangular Zero |
|`U+06E1` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۡ Small High Dotless Head Of Khah |
|`U+06E2` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۢ Small High Meem Isolated Form |
|`U+06E3` | Mark [Mn] | TRANSPARENT | _null_ | 220_MCM | ۣ Small Low Seen |
|`U+06E4` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۤ Small High Madda |
|`U+06E5` | Letter modifier | NON_JOINING | _null_ | _0_ | ۥ Small Waw |
|`U+06E6` | Letter modifier | NON_JOINING | _null_ | _0_ | ۦ Small Yeh |
|`U+06E7` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ۧ Small High Yeh |
|`U+06E8` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ۨ Small High Noon |
|`U+06E9` | Symbol | NON_JOINING | _null_ | _0_ | ۩ Place Of Sajdah |
|`U+06EA` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ۪ Empty Centre Low Stop |
|`U+06EB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۫ Empty Centre High Stop |
|`U+06EC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ۬ Rounded High Stop With Filled Centre |
|`U+06ED` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ۭ Small Low Meem |
|`U+06EE` | Letter | RIGHT | DAL | _0_ | ۮ Dal With Inverted V Above |
|`U+06EF` | Letter | RIGHT | REH | _0_ | ۯ Reh With Inverted V Above |
| | | | | |
|`U+06F0` | Number | NON_JOINING | _null_ | _0_ | ۰ Extended Digit Zero |
|`U+06F1` | Number | NON_JOINING | _null_ | _0_ | ۱ Extended Digit One |
|`U+06F2` | Number | NON_JOINING | _null_ | _0_ | ۲ Extended Digit Two |
|`U+06F3` | Number | NON_JOINING | _null_ | _0_ | ۳ Extended Digit Three |
|`U+06F4` | Number | NON_JOINING | _null_ | _0_ | ۴ Extended Digit Four |
|`U+06F5` | Number | NON_JOINING | _null_ | _0_ | ۵ Extended Digit Five |
|`U+06F6` | Number | NON_JOINING | _null_ | _0_ | ۶ Extended Digit Six |
|`U+06F7` | Number | NON_JOINING | _null_ | _0_ | ۷ Extended Digit Seven |
|`U+06F8` | Number | NON_JOINING | _null_ | _0_ | ۸ Extended Digit Eight |
|`U+06F9` | Number | NON_JOINING | _null_ | _0_ | ۹ Extended Digit Nine |
|`U+06FA` | Letter | DUAL | SEEN | _0_ | ۺ Sheen With Dot Below |
|`U+06FB` | Letter | DUAL | SAD | _0_ | ۻ Dad With Dot Below |
|`U+06FC` | Letter | DUAL | AIN | _0_ | ۼ Ghain With Dot Below |
|`U+06FD` | Symbol | NON_JOINING | _null_ | _0_ | ۽ Sign Sindhi Ampersand |
|`U+06FE` | Symbol | NON_JOINING | _null_ | _0_ | ۾ Sign Sindhi Postposition Men |
|`U+06FF` | Letter | DUAL | KNOTTED_HEH | _0_ | ۿ Knotted Heh With Inverted V Above |
:::
## Arabic Supplement character table ##
:::{table} Arabic Supplement block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------------------------|
|`U+0750` | Letter | DUAL | BEH | _0_ | ݐ Dotless Beh With Horizontal 3 Dots Below |
|`U+0751` | Letter | DUAL | BEH | _0_ | ݑ Beh With 3 Dots Above |
|`U+0752` | Letter | DUAL | BEH | _0_ | ݒ Dotless Beh With Inverted 3 Dots Below |
|`U+0753` | Letter | DUAL | BEH | _0_ | ݓ Dotless Beh With Inverted 3 Dots Below And 2 Dots Above|
|`U+0754` | Letter | DUAL | BEH | _0_ | ݔ Dotless Beh With 2 Dots Below And Dot Above |
|`U+0755` | Letter | DUAL | BEH | _0_ | ݕ Dotless Beh With Inverted V Below |
|`U+0756` | Letter | DUAL | BEH | _0_ | ݖ Dotless Beh With V Above |
|`U+0757` | Letter | DUAL | HAH | _0_ | ݗ Hah With 2 Dots Above |
|`U+0758` | Letter | DUAL | HAH | _0_ | ݘ Hah With Inverted 3 Dots Below |
|`U+0759` | Letter | RIGHT | DAL | _0_ | ݙ Dal With Vertical 2 Dots Below And Tah Above |
|`U+075A` | Letter | RIGHT | DAL | _0_ | ݚ Dal With Inverted V Below |
|`U+075B` | Letter | RIGHT | REH | _0_ | ݛ Reh With Bar |
|`U+075C` | Letter | DUAL | SEEN | _0_ | ݜ Seen With 4 Dots Above |
|`U+075D` | Letter | DUAL | AIN | _0_ | ݝ Ain With 2 Dots Above |
|`U+075E` | Letter | DUAL | AIN | _0_ | ݞ Ain With Inverted 3 Dots Above |
|`U+075F` | Letter | DUAL | AIN | _0_ | ݟ Ain With Vertical 2 Dots Above |
| | | | | |
|`U+0760` | Letter | DUAL | FEH | _0_ | ݠ Dotless Feh With 2 Dots Below |
|`U+0761` | Letter | DUAL | FEH | _0_ | ݡ Dotless Feh With Inverted 3 Dots Below |
|`U+0762` | Letter | DUAL | GAF | _0_ | ݢ Keheh With Dot Above |
|`U+0763` | Letter | DUAL | GAF | _0_ | ݣ Keheh With 3 Dots Above |
|`U+0764` | Letter | DUAL | GAF | _0_ | ݤ Keheh With Inverted 3 Dots Below |
|`U+0765` | Letter | DUAL | MEEM | _0_ | ݥ Meem With Dot Above |
|`U+0766` | Letter | DUAL | MEEM | _0_ | ݦ Meem With Dot Below |
|`U+0767` | Letter | DUAL | NOON | _0_ | ݧ Noon With 2 Dots Below |
|`U+0768` | Letter | DUAL | NOON | _0_ | ݨ Noon With Tah Above |
|`U+0769` | Letter | DUAL | NOON | _0_ | ݩ Noon With V Above |
|`U+076A` | Letter | DUAL | LAM | _0_ | ݪ Lam With Bar |
|`U+076B` | Letter | RIGHT | REH | _0_ | ݫ Reh With Vertical 2 Dots Above |
|`U+076C` | Letter | RIGHT | REH | _0_ | ݬ Reh With Hamza Above |
|`U+076D` | Letter | DUAL | SEEN | _0_ | ݭ Seen With Vertical 2 Dots Above |
|`U+076E` | Letter | DUAL | HAH | _0_ | ݮ Hah With Tah Below |
|`U+076F` | Letter | DUAL | HAH | _0_ | ݯ Hah With Tah And 2 Dots Below |
| | | | | |
|`U+0770` | Letter | DUAL | SEEN | _0_ | ݰ Seen With 2 Dots And Tah Above |
|`U+0771` | Letter | RIGHT | REH | _0_ | ݱ Reh With 2 Dots And Tah Above |
|`U+0772` | Letter | DUAL | HAH | _0_ | ݲ Hah With Tah Above |
|`U+0773` | Letter | RIGHT | ALEF | _0_ | ݳ Alef With Digit Two Above |
|`U+0774` | Letter | RIGHT | ALEF | _0_ | ݴ Alef With Digit Three Above |
|`U+0775` | Letter | DUAL | FARSI_YEH | _0_ | ݵ Farsi Yeh With Digit Two Above |
|`U+0776` | Letter | DUAL | FARSI_YEH | _0_ | ݶ Farsi Yeh With Digit Three Above |
|`U+0777` | Letter | DUAL | YEH | _0_ | ݷ Dotless Yeh With Digit Four Below |
|`U+0778` | Letter | RIGHT | WAW | _0_ | ݸ Waw With Digit Two Above |
|`U+0779` | Letter | RIGHT | WAW | _0_ | ݹ Waw With Digit Three Above |
|`U+077A` | Letter | DUAL | BURUSHASKI_YEH_BARREE| _0_ | ݺ Burushaski Yeh Barree With Digit Two Above |
|`U+077B` | Letter | DUAL | BURUSHASKI_YEH_BARREE| _0_ | ݻ Burushaski Yeh Barree With Digit Three Above |
|`U+077C` | Letter | DUAL | HAH | _0_ | ݼ Hah With Digit Four Below |
|`U+077D` | Letter | DUAL | SEEN | _0_ | ݽ Seen With Digit Four Above |
|`U+077E` | Letter | DUAL | SEEN | _0_ | ݾ Seen With Inverted V Above |
|`U+077F` | Letter | DUAL | KAF | _0_ | ݿ Kaf With 2 Dots Above |
:::
## Arabic Extended-A character table ##
:::{table} Arabic Extended-A block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|
|`U+08A0` | Letter | DUAL | BEH | _0_ | ࢠ Dotless Beh With V Below |
|`U+08A1` | Letter | DUAL | BEH | _0_ | ࢡ Beh With Hamza Above |
|`U+08A2` | Letter | DUAL | HAH | _0_ | ࢢ Hah With Dot Below And 2 Dots Above |
|`U+08A3` | Letter | DUAL | TAH | _0_ | ࢣ Tah With 2 Dots Above |
|`U+08A4` | Letter | DUAL | FEH | _0_ | ࢤ Dotless Feh With Dot Below And 3 Dots Above |
|`U+08A5` | Letter | DUAL | QAF | _0_ | ࢥ Qaf With Dot Below |
|`U+08A6` | Letter | DUAL | LAM | _0_ | ࢦ Lam With Double Bar |
|`U+08A7` | Letter | DUAL | MEEM | _0_ | ࢧ Meem With 3 Dots Above |
|`U+08A8` | Letter | DUAL | YEH | _0_ | ࢨ Yeh With Hamza Above |
|`U+08A9` | Letter | DUAL | YEH | _0_ | ࢩ Yeh With Dot Above |
|`U+08AA` | Letter | RIGHT | REH | _0_ | ࢪ Reh With Loop |
|`U+08AB` | Letter | RIGHT | WAW | _0_ | ࢫ Waw With Dot Within |
|`U+08AC` | Letter | RIGHT | ROHINGYA_YEH | _0_ | ࢬ Rohingya Yeh |
|`U+08AD` | Letter | NON_JOINING | _null_ | _0_ | ࢭ Low Alef |
|`U+08AE` | Letter | RIGHT | DAL | _0_ | ࢮ Dal With 3 Dots Below |
|`U+08AF` | Letter | DUAL | SAD | _0_ | ࢯ Sad With 3 Dots Below |
| | | | | |
|`U+08B0` | Letter | DUAL | GAF | _0_ | ࢰ Keheh With Stroke Below |
|`U+08B1` | Letter | RIGHT | STRAIGHT_WAW | _0_ | ࢱ Straight Waw |
|`U+08B2` | Letter | RIGHT | REH | _0_ | ࢲ Reh With Dot And Inverted V Above |
|`U+08B3` | Letter | DUAL | AIN | _0_ | ࢳ Ain With 3 Dots Below |
|`U+08B4` | Letter | DUAL | KAF | _0_ | ࢴ Kaf With Dot Below |
|`U+08B5` | Letter | DUAL | QAF | _0_ | ࢵ Qaf With Dot Below |
|`U+08B6` | Letter | DUAL | BEH | _0_ | ࢶ Beh With Meem Above |
|`U+08B7` | Letter | DUAL | BEH | _0_ | ࢷ Dotless Beh With 3 Dots Below And Meem Above |
|`U+08B8` | Letter | DUAL | BEH | _0_ | ࢸ Dotless Beh With Teh Above |
|`U+08B9` | Letter | RIGHT | REH | _0_ | ࢹ Reh With Noon Above |
|`U+08BA` | Letter | DUAL | YEH | _0_ | ࢺ Yeh With Noon Above |
|`U+08BB` | Letter | DUAL | AFRICAN_FEH | _0_ | ࢻ African Feh |
|`U+08BC` | Letter | DUAL | AFRICAN_QAF | _0_ | ࢼ African Qaf |
|`U+08BD` | Letter | DUAL | AFRICAN_NOON | _0_ | ࢽ African Noon |
|`U+08BE` | Letter | DUAL | BEH | _0_ | ࢾ Peh With Small V |
|`U+08BF` | Letter | DUAL | BEH | _0_ | ࢿ Teh With Small V |
| | | | | |
|`U+08C0` | Letter | DUAL | BEH | _0_ | ࣀ Tteh With Small V |
|`U+08C1` | Letter | DUAL | HAH | _0_ | ࣁ Tcheh With Small V |
|`U+08C2` | Letter | DUAL | GAF | _0_ | ࣂ Keheh With Small V |
|`U+08C3` | Letter | DUAL | AIN | _0_ | ࣃ Ghain With 3 Dots Above |
|`U+08C4` | Letter | DUAL | AFRICAN_QAF | _0_ | ࣄ African Qaf With 3 Dots Above |
|`U+08C5` | Letter | DUAL | HAH | _0_ | ࣅ Jeem With 3 Dots Above |
|`U+08C6` | Letter | DUAL | HAH | _0_ | ࣆ Jeem With 3 Dots Below |
|`U+08C7` | Letter | DUAL | LAM | _0_ | ࣇ Lam With Small Arabic Tah Above |
|`U+08C8` | Letter | DUAL | GAF | _0_ | ࣈ Graf |
|`U+08C9` | Letter modifier | TRANSPARENT | _null_ | _0_ | ࣉ Small Farsi Yeh |
|`U+08CA` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ࣊ Small High Farsi Yeh |
|`U+08CB` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ࣋ Small High Yeh Barree With Two Dots Below |
|`U+08CC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣌ Small High Word Sah |
|`U+08CD` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ࣍ Small High Zah |
|`U+08CE` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ࣎ Large Round Dot Above |
|`U+08CF` | Mark [Mn] | TRANSPARENT | _null_ | 220_MCM | ࣏ Large Round Dot Below |
| | | | | |
|`U+08D0` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣐ Sukun Below |
|`U+08D1` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣑ Large Circle Below |
|`U+08D2` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣒ Large Round Dot Inside Circle Below |
|`U+08D3` | Mark [Mn] | TRANSPARENT | _null_ | 220_MCM | ࣓ Small Low Waw |
|`U+08D4` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣔ Small High Word Ar-Rub |
|`U+08D5` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣕ Small High Sad |
|`U+08D6` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣖ Small High Ain |
|`U+08D7` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣗ Small High Qaf |
|`U+08D8` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣘ Small High Noon With Kasra |
|`U+08D9` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣙ Small Low Noon With Kasra |
|`U+08DA` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣚ Small High Word Ath-Thalatha |
|`U+08DB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣛ Small High Word As-Sajda |
|`U+08DC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣜ Small High Word An-Nisf |
|`U+08DD` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣝ Small High Word Sakta |
|`U+08DE` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣞ Small High Word Qif |
|`U+08DF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣟ Small High Word Waqfa |
| | | | | |
|`U+08E0` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣠ Small High Footnote Marker |
|`U+08E1` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣡ Small High Sign Safha |
|`U+08E2` | Other | NON_JOINING | _null_ | _0_ | Disputed End Of Ayah |
|`U+08E3` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣣ Turned Damma Below |
|`U+08E4` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣤ Curly Fatha |
|`U+08E5` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣥ Curly Damma |
|`U+08E6` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣦ Curly Kasra |
|`U+08E7` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣧ Curly Fathatan |
|`U+08E8` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣨ Curly Dammatan |
|`U+08E9` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣩ Curly Kasratan |
|`U+08EA` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣪ Tone One Dot Above |
|`U+08EB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣫ Tone Two Dots aAove |
|`U+08EC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣬ Tone Loop Above |
|`U+08ED` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣭ Tone One Dot Below |
|`U+08EE` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣮ Tone Two Dots Below |
|`U+08EF` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣯ Tone Loop Below |
| | | | | |
|`U+08F0` | Mark [Mn] | TRANSPARENT | _null_ | 27 | ࣰ Open Fathatan |
|`U+08F1` | Mark [Mn] | TRANSPARENT | _null_ | 28 | ࣱ Open Dammatan |
|`U+08F2` | Mark [Mn] | TRANSPARENT | _null_ | 29 | ࣲ Open Kasratan |
|`U+08F3` | Mark [Mn] | TRANSPARENT | _null_ | 230_MCM | ࣳ Small High Waw |
|`U+08F4` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣴ Fatha With Ring |
|`U+08F5` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣵ Fatha With Dot Above |
|`U+08F6` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣶ Kasra With Dot Below |
|`U+08F7` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣷ Left Arrowhead Above |
|`U+08F8` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣸ Right Arrowhead Above |
|`U+08F9` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣹ Left Arrowhead Below |
|`U+08FA` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣺ Right Arrowhead Below |
|`U+08FB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣻ Double Right Arrowhead Above |
|`U+08FC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣼ Double Right Arrowhead Above With Dot |
|`U+08FD` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣽ Right Arrowhead Above With Dot |
|`U+08FE` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣾ Damma With Dot |
|`U+08FF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣿ Mark Sideways Noon Ghunna |
:::
## Arabic Extended-B character table ##
:::{table} Arabic Extended-B block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|
|`U+0870` | Letter | RIGHT | ALEF | _0_ | ࡰ Alef With Attached Fatha |
|`U+0871` | Letter | RIGHT | ALEF | _0_ | ࡱ Alef With Attached Top Right Fatha |
|`U+0872` | Letter | RIGHT | ALEF | _0_ | ࡲ Alef With Right Middle Stroke |
|`U+0873` | Letter | RIGHT | ALEF | _0_ | ࡳ Alef With Left Middle Stroke |
|`U+0874` | Letter | RIGHT | ALEF | _0_ | ࡴ Alef With Attached Kasra |
|`U+0875` | Letter | RIGHT | ALEF | _0_ | ࡵ Alef With Attached Bottom Right Kasra |
|`U+0876` | Letter | RIGHT | ALEF | _0_ | ࡶ Alef With Attached Round Dot Above |
|`U+0877` | Letter | RIGHT | ALEF | _0_ | ࡷ Alef With Attached Right Round Dot |
|`U+0878` | Letter | RIGHT | ALEF | _0_ | ࡸ Alef With Attached Left Round Dot |
|`U+0879` | Letter | RIGHT | ALEF | _0_ | ࡹ Alef With Attached Round Dot Below |
|`U+087A` | Letter | RIGHT | ALEF | _0_ | ࡺ Alef With Dot Above |
|`U+087B` | Letter | RIGHT | ALEF | _0_ | ࡻ Alef With Attached Top Right Fatha And Dot Above|
|`U+087C` | Letter | RIGHT | ALEF | _0_ | ࡼ Alef With Right Middle Stroke And Dot Above |
|`U+087D` | Letter | RIGHT | ALEF | _0_ | ࡽ Alef With Attached Bottom Right Kasra And Dot Above|
|`U+087E` | Letter | RIGHT | ALEF | _0_ | ࡾ Alef With Attached Top Right Fatha And Left Ring|
|`U+087F` | Letter | RIGHT | ALEF | _0_ | ࡿ Alef With Right Middle Stroke And Left Ring |
| | | | | |
|`U+0880` | Letter | RIGHT | ALEF | _0_ | ࢀ Alef With Attached Bottom Right Kasra And Left Ring|
|`U+0881` | Letter | RIGHT | ALEF | _0_ | ࢁ Alef With Attached Right Hamza |
|`U+0882` | Letter | RIGHT | ALEF | _0_ | ࢂ Alef With Attached Left Hamza |
|`U+0883` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢃ Tatweel With Overstruck Hamza |
|`U+0884` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢄ Tatweel With Overstruck Waw |
|`U+0885` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢅ Tatweel With Two Dots Below |
|`U+0886` | Letter | DUAL | THIN_YEH | _0_ | ࢆ Thin Yeh |
|`U+0887` | Letter | NON_JOINING | _null_ | _0_ | ࢇ Baseline Round Dot |
|`U+0888` | Symbol | NON_JOINING | _null_ | _0_ | ࢈ Raised Round Dot |
|`U+0889` | Letter | DUAL | NOON | _0_ | ࢉ Noon With Inverted Small V |
|`U+088A` | Letter | DUAL | HAH | _0_ | ࢊ Hah With Inverted Small V Below |
|`U+088B` | Letter | DUAL | TAH | _0_ | ࢋ Tah With Dot Below |
|`U+088C` | Letter | DUAL | TAH | _0_ | ࢌ Tah With Three Dots Below |
|`U+088D` | Letter | DUAL | GAF | _0_ | ࢍ Keheh With Two Dots Vertically Below |
|`U+088E` | Letter | RIGHT | VERTICAL_TAIL | _0_ | ࢎ Vertical Tail |
|`U+088F` | _unassigned_ | | | | |
| | | | | |
|`U+0890` | Symbol | NON_JOINING | _null_ | _0_ | Pound Mark Above |
|`U+0891` | Symbol | NON_JOINING | _null_ | _0_ | Piastre Mark Above |
|`U+0892` | _unassigned_ | | | | |
|`U+0893` | _unassigned_ | | | | |
|`U+0894` | _unassigned_ | | | | |
|`U+0895` | _unassigned_ | | | | |
|`U+0896` | _unassigned_ | | | | |
|`U+0897` | Mark [Mn] | TRANSPARENT | _null_ | 230 | Pepet |
|`U+0898` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢘ Small High Word Al-Juz |
|`U+0899` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢙ Small Low Word Ishmaam |
|`U+089A` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢚ Small Low Word Imaala |
|`U+089B` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢛ Small Low Word Tasheel |
|`U+089C` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢜ Madda Waajib |
|`U+089D` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢝ Superscript Alef Mokhassas |
|`U+089E` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢞ Doubled Madda |
|`U+089F` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢟ Half Madda Over Madda |
| | | | | |
:::
## Arabic Extended-C character table ##
:::{table} Arabic Extended-C block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------|
|`U+10EC0` | _unassigned_ | | | | |
|`U+10EC1` | _unassigned_ | | | | |
|`U+10EC2` | Letter | RIGHT | DAL | _0_ | Dal With Two Dots Vertically Below |
|`U+10EC3` | Letter | DUAL | TAH | _0_ | Tah With Two Dots Vertically Below |
|`U+10EC4` | Letter | DUAL | KAF | _0_ | Kaf With Two Dots Vertically Below |
|`U+10EC5` | _unassigned_ | | | | |
|`U+10EC6` | _unassigned_ | | | | |
|`U+10EC7` | _unassigned_ | | | | |
|`U+10EC8` | _unassigned_ | | | | |
|`U+10EC9` | _unassigned_ | | | | |
|`U+10ECA` | _unassigned_ | | | | |
|`U+10ECB` | _unassigned_ | | | | |
|`U+10ECC` | _unassigned_ | | | | |
|`U+10ECD` | _unassigned_ | | | | |
|`U+10ECE` | _unassigned_ | | | | |
|`U+10ECF` | _unassigned_ | | | | |
| | | | | |
|`U+10ED0` | _unassigned_ | | | | |
|`U+10ED1` | _unassigned_ | | | | |
|`U+10ED2` | _unassigned_ | | | | |
|`U+10ED3` | _unassigned_ | | | | |
|`U+10ED4` | _unassigned_ | | | | |
|`U+10ED5` | _unassigned_ | | | | |
|`U+10ED6` | _unassigned_ | | | | |
|`U+10ED7` | _unassigned_ | | | | |
|`U+10ED8` | _unassigned_ | | | | |
|`U+10ED9` | _unassigned_ | | | | |
|`U+10EDA` | _unassigned_ | | | | |
|`U+10EDB` | _unassigned_ | | | | |
|`U+10EDC` | _unassigned_ | | | | |
|`U+10EDD` | _unassigned_ | | | | |
|`U+10EDE` | _unassigned_ | | | | |
|`U+10EDF` | _unassigned_ | | | | |
| | | | | |
|`U+10EE0` | _unassigned_ | | | | |
|`U+10EE1` | _unassigned_ | | | | |
|`U+10EE2` | _unassigned_ | | | | |
|`U+10EE3` | _unassigned_ | | | | |
|`U+10EE4` | _unassigned_ | | | | |
|`U+10EE5` | _unassigned_ | | | | |
|`U+10EE6` | _unassigned_ | | | | |
|`U+10EE7` | _unassigned_ | | | | |
|`U+10EE8` | _unassigned_ | | | | |
|`U+10EE9` | _unassigned_ | | | | |
|`U+10EEA` | _unassigned_ | | | | |
|`U+10EEB` | _unassigned_ | | | | |
|`U+10EEC` | _unassigned_ | | | | |
|`U+10EED` | _unassigned_ | | | | |
|`U+10EEE` | _unassigned_ | | | | |
|`U+10EEF` | _unassigned_ | | | | |
| | | | | |
|`U+10EF0` | _unassigned_ | | | | |
|`U+10EF1` | _unassigned_ | | | | |
|`U+10EF2` | _unassigned_ | | | | |
|`U+10EF3` | _unassigned_ | | | | |
|`U+10EF4` | _unassigned_ | | | | |
|`U+10EF5` | _unassigned_ | | | | |
|`U+10EF6` | _unassigned_ | | | | |
|`U+10EF7` | _unassigned_ | | | | |
|`U+10EF8` | _unassigned_ | | | | |
|`U+10EF9` | _unassigned_ | | | | |
|`U+10EFA` | _unassigned_ | | | | |
|`U+10EFB` | _unassigned_ | | | | |
|`U+10EFC` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | Combining Alef Overlay |
|`U+10EFD` | Mark [Mn] | TRANSPARENT | _null_ | 220 | 𐻽 Small Low Word Sakta |
|`U+10EFE` | Mark [Mn] | TRANSPARENT | _null_ | 220 | 𐻾 Small Low Word Qasr |
|`U+10EFF` | Mark [Mn] | TRANSPARENT | _null_ | 220 | 𐻿 Small Low Word Madda |
| | | | | |
:::
## Rumi Numeral Symbols character table ##
:::{table} Rumi Numeral Symbols block table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|
|`U+10E60` | Number | NON_JOINING | _null_ | _0_ | 𐹠 Digit One |
|`U+10E61` | Number | NON_JOINING | _null_ | _0_ | 𐹡 Digit Two |
|`U+10E62` | Number | NON_JOINING | _null_ | _0_ | 𐹢 Digit Three |
|`U+10E63` | Number | NON_JOINING | _null_ | _0_ | 𐹣 Digit Four |
|`U+10E64` | Number | NON_JOINING | _null_ | _0_ | 𐹤 Digit Five |
|`U+10E65` | Number | NON_JOINING | _null_ | _0_ | 𐹥 Digit Six |
|`U+10E66` | Number | NON_JOINING | _null_ | _0_ | 𐹦 Digit Seven |
|`U+10E67` | Number | NON_JOINING | _null_ | _0_ | 𐹧 Digit Eight |
|`U+10E68` | Number | NON_JOINING | _null_ | _0_ | 𐹨 Digit Nine |
|`U+10E69` | Number | NON_JOINING | _null_ | _0_ | 𐹩 Number Ten |
|`U+10E6A` | Number | NON_JOINING | _null_ | _0_ | 𐹪 Number Twenty |
|`U+10E6B` | Number | NON_JOINING | _null_ | _0_ | 𐹫 Number Thirty |
|`U+10E6C` | Number | NON_JOINING | _null_ | _0_ | 𐹬 Number Forty |
|`U+10E6D` | Number | NON_JOINING | _null_ | _0_ | 𐹭 Number Fifty |
|`U+10E6E` | Number | NON_JOINING | _null_ | _0_ | 𐹮 Number Sixty |
|`U+10E6F` | Number | NON_JOINING | _null_ | _0_ | 𐹯 Number Seventy |
| | | | | |
|`U+10E70` | Number | NON_JOINING | _null_ | _0_ | 𐹰 Number Eighty |
|`U+10E71` | Number | NON_JOINING | _null_ | _0_ | 𐹱 Number Ninety |
|`U+10E72` | Number | NON_JOINING | _null_ | _0_ | 𐹲 Number One Hundred |
|`U+10E73` | Number | NON_JOINING | _null_ | _0_ | 𐹳 Number Two Hundred |
|`U+10E74` | Number | NON_JOINING | _null_ | _0_ | 𐹴 Number Three Hundred |
|`U+10E75` | Number | NON_JOINING | _null_ | _0_ | 𐹵 Number Four Hundred |
|`U+10E76` | Number | NON_JOINING | _null_ | _0_ | 𐹶 Number Five Hundred |
|`U+10E77` | Number | NON_JOINING | _null_ | _0_ | 𐹷 Number Six Hundred |
|`U+10E78` | Number | NON_JOINING | _null_ | _0_ | 𐹸 Number Seven Hundred |
|`U+10E79` | Number | NON_JOINING | _null_ | _0_ | 𐹹 Number Eight Hundred |
|`U+10E7A` | Number | NON_JOINING | _null_ | _0_ | 𐹺 Number Nine Hundred |
|`U+10E7B` | Number | NON_JOINING | _null_ | _0_ | 𐹻 Fraction One Half |
|`U+10E7C` | Number | NON_JOINING | _null_ | _0_ | 𐹼 Fraction One Quarter |
|`U+10E7D` | Number | NON_JOINING | _null_ | _0_ | 𐹽 Fraction One Third |
|`U+10E7E` | Number | NON_JOINING | _null_ | _0_ | 𐹾 Fraction Two Thirds |
|`U+10E7F` | _unassigned_ | | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Arabic text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
combining mark in isolation. Real-world text syllables may also use
other characters, such as hyphens or dashes, in a similar placeholder
fashion; shaping engines should cope with this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|
|`U+00A0` | Separator | NON_JOINING | _null_ | _0_ | No-break space |
|`U+034F` | Other | NON_JOINING | _null_ | _0_ | ͏ Combining grapheme joiner |
|`U+200C` | Other | NON_JOINING | _null_ | _0_ | Zero-width non-joiner |
|`U+200D` | Other | JOIN_CAUSING | _null_ | _0_ | Zero-width joiner |
|`U+200E` | Other | NON_JOINING | _null_ | _0_ | Left-to-Right marker |
|`U+200F` | Other | NON_JOINING | _null_ | _0_ | Right-to-Left marker |
|`U+2010` | Punctuation | NON_JOINING | _null_ | _0_ | ‐ Hyphen |
|`U+2011` | Punctuation | NON_JOINING | _null_ | _0_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | NON_JOINING | _null_ | _0_ | ‒ Figure dash |
|`U+2013` | Punctuation | NON_JOINING | _null_ | _0_ | – En dash |
|`U+2014` | Punctuation | NON_JOINING | _null_ | _0_ | — Em dash |
|`U+25CC` | Symbol | NON_JOINING | _null_ | _0_ | ◌ Dotted circle |
:::
The combining grapheme joiner (CGJ) is primarily used to alter the
order in which adjacent marks are positioned during the
mark-reordering stage, in order to adhere to the needs of a
non-default language orthography.
The zero-width joiner (ZWJ) is primarily used to force the usage of the
cursive connecting form of a letter even when the context of the
adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such
as for displaying it in a table of forms), the sequence "_Letter_,ZWJ"
would be used. To show the medial form of a letter in isolation, the
sequence "ZWJ,_Letter_,ZWJ" would be used.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by
the Unicode bidirectionality algorithm (BiDi) to indicate the points
in a text run at which the writing direction changes.
The no-break space is primarily used to display those codepoints that
are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
================================================
FILE: character-tables/character-tables-bengali.md
================================================
# Bengali character tables #
This document lists the per-character shaping information needed to
[shape Bengali text](../opentype-shaping-bengali.md).
**Contents**
- [Bengali character table](#bengali-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Bengali character table ##
Bengali glyphs should be classified as in the following
table. Codepoints in the Bengali block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Bengali character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0980` | Letter | CONSONANT_PLACEHOLDER | _null_ | ঀ Anji |
|`U+0981` | Mark [Mn] | BINDU | TOP_POSITION | ঁ Candrabindu |
|`U+0982` | Mark [Mc] | BINDU | RIGHT_POSITION | ং Anusvara |
|`U+0983` | Mark [Mc] | VISARGA | RIGHT_POSITION | ঃ Visarga |
|`U+0984` | _unassigned_ | | | |
|`U+0985` | Letter | VOWEL_INDEPENDENT | _null_ | অ A |
|`U+0986` | Letter | VOWEL_INDEPENDENT | _null_ | আ Aa |
|`U+0987` | Letter | VOWEL_INDEPENDENT | _null_ | ই I |
|`U+0988` | Letter | VOWEL_INDEPENDENT | _null_ | ঈ Ii |
|`U+0989` | Letter | VOWEL_INDEPENDENT | _null_ | উ U |
|`U+098A` | Letter | VOWEL_INDEPENDENT | _null_ | ঊ Uu |
|`U+098B` | Letter | VOWEL_INDEPENDENT | _null_ | ঋ Vocalic R |
|`U+098C` | Letter | VOWEL_INDEPENDENT | _null_ | ঌ Vocalic L |
|`U+098D` | _unassigned_ | | | |
|`U+098E` | _unassigned_ | | | |
|`U+098F` | Letter | VOWEL_INDEPENDENT | _null_ | এ E |
| | | | |
|`U+0990` | Letter | VOWEL_INDEPENDENT | _null_ | ঐ Ai |
|`U+0991` | _unassigned_ | | | |
|`U+0992` | _unassigned_ | | | |
|`U+0993` | Letter | VOWEL_INDEPENDENT | _null_ | ও O |
|`U+0994` | Letter | VOWEL_INDEPENDENT | _null_ | ঔ Au |
|`U+0995` | Letter | CONSONANT | _null_ | ক Ka |
|`U+0996` | Letter | CONSONANT | _null_ | খ Kha |
|`U+0997` | Letter | CONSONANT | _null_ | গ Ga |
|`U+0998` | Letter | CONSONANT | _null_ | ঘ Gha |
|`U+0999` | Letter | CONSONANT | _null_ | ঙ Nga |
|`U+099A` | Letter | CONSONANT | _null_ | চ Ca |
|`U+099B` | Letter | CONSONANT | _null_ | ছ Cha |
|`U+099C` | Letter | CONSONANT | _null_ | জ Ja |
|`U+099D` | Letter | CONSONANT | _null_ | ঝ Jha |
|`U+099E` | Letter | CONSONANT | _null_ | ঞ Nya |
|`U+099F` | Letter | CONSONANT | _null_ | ট Tta |
| | | | |
|`U+09A0` | Letter | CONSONANT | _null_ | ঠ Ttha |
|`U+09A1` | Letter | CONSONANT | _null_ | ড Dda |
|`U+09A2` | Letter | CONSONANT | _null_ | ঢ Ddha |
|`U+09A3` | Letter | CONSONANT | _null_ | ণ Nna |
|`U+09A4` | Letter | CONSONANT | _null_ | ত Ta |
|`U+09A5` | Letter | CONSONANT | _null_ | থ Tha |
|`U+09A6` | Letter | CONSONANT | _null_ | দ Da |
|`U+09A7` | Letter | CONSONANT | _null_ | ধ Dha |
|`U+09A8` | Letter | CONSONANT | _null_ | ন Na |
|`U+09A9` | _unassigned_ | | | |
|`U+09AA` | Letter | CONSONANT | _null_ | প Pa |
|`U+09AB` | Letter | CONSONANT | _null_ | ফ Pha |
|`U+09AC` | Letter | CONSONANT | _null_ | ব Ba |
|`U+09AD` | Letter | CONSONANT | _null_ | ভ Bha |
|`U+09AE` | Letter | CONSONANT | _null_ | ম Ma |
|`U+09AF` | Letter | CONSONANT | _null_ | য Ya |
| | | | |
|`U+09B0` | Letter | CONSONANT | _null_ | র Ra |
|`U+09B1` | _unassigned_ | | | |
|`U+09B2` | Letter | CONSONANT | _null_ | ল La |
|`U+09B3` | _unassigned_ | | | |
|`U+09B4` | _unassigned_ | | | |
|`U+09B5` | _unassigned_ | | | |
|`U+09B6` | Letter | CONSONANT | _null_ | শ Sha |
|`U+09B7` | Letter | CONSONANT | _null_ | ষ Ssa |
|`U+09B8` | Letter | CONSONANT | _null_ | স Sa |
|`U+09B9` | Letter | CONSONANT | _null_ | হ Ha |
|`U+09BA` | _unassigned_ | | | |
|`U+09BB` | _unassigned_ | | | |
|`U+09BC` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ় Nukta |
|`U+09BD` | Letter | AVAGRAHA | _null_ | ঽ Avagraha |
|`U+09BE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | া Sign Aa |
|`U+09BF` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ি Sign I |
| | | | |
|`U+09C0` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ী Sign Ii |
|`U+09C1` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ু Sign U |
|`U+09C2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ূ Sign Uu |
|`U+09C3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ৃ Sign Vocalic R |
|`U+09C4` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ৄ Sign Vocalic Rr |
|`U+09C5` | _unassigned_ | | | |
|`U+09C6` | _unassigned_ | | | |
|`U+09C7` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ে Sign E |
|`U+09C8` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ৈ Sign Ai |
|`U+09C9` | _unassigned_ | | | |
|`U+09CA` | _unassigned_ | | | |
|`U+09CB` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ো Sign O |
|`U+09CC` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ৌ Sign Au |
|`U+09CD` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ্ Virama |
|`U+09CE` | Letter | CONSONANT_DEAD | _null_ | ৎ Khanda Ta |
|`U+09CF` | _unassigned_ | | | |
| | | | |
|`U+09D0` | _unassigned_ | | | |
|`U+09D1` | _unassigned_ | | | |
|`U+09D2` | _unassigned_ | | | |
|`U+09D3` | _unassigned_ | | | |
|`U+09D4` | _unassigned_ | | | |
|`U+09D5` | _unassigned_ | | | |
|`U+09D6` | _unassigned_ | | | |
|`U+09D7` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ৗ Au Length Mark |
|`U+09D8` | _unassigned_ | | | |
|`U+09D9` | _unassigned_ | | | |
|`U+09DA` | _unassigned_ | | | |
|`U+09DB` | _unassigned_ | | | |
|`U+09DC` | Letter | CONSONANT | _null_ | ড় Rra |
|`U+09DD` | Letter | CONSONANT | _null_ | ঢ় Rha |
|`U+09DE` | _unassigned_ | | | |
|`U+09DF` | Letter | CONSONANT | _null_ | য় Yya |
| | | | |
|`U+09E0` | Letter | VOWEL_INDEPENDENT | _null_ | ৠ Vocalic Rr |
|`U+09E1` | Letter | VOWEL_INDEPENDENT | _null_ | ৡ Vocalic Ll |
|`U+09E2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ৢ Sign Vocalic L |
|`U+09E3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ৣ Sign Vocalic Ll |
|`U+09E4` | _unassigned_ | | | |
|`U+09E5` | _unassigned_ | | | |
|`U+09E6` | Number | NUMBER | _null_ | ০ Digit Zero |
|`U+09E7` | Number | NUMBER | _null_ | ১ Digit One |
|`U+09E8` | Number | NUMBER | _null_ | ২ Digit Two |
|`U+09E9` | Number | NUMBER | _null_ | ৩ Digit Three |
|`U+09EA` | Number | NUMBER | _null_ | ৪ Digit Four |
|`U+09EB` | Number | NUMBER | _null_ | ৫ Digit Five |
|`U+09EC` | Number | NUMBER | _null_ | ৬ Digit Six |
|`U+09ED` | Number | NUMBER | _null_ | ৭ Digit Seven |
|`U+09EE` | Number | NUMBER | _null_ | ৮ Digit Eight |
|`U+09EF` | Number | NUMBER | _null_ | ৯ Digit Nine |
| | | | |
|`U+09F0` | Letter | CONSONANT | _null_ | ৰ Assamese Ra |
|`U+09F1` | Letter | CONSONANT | _null_ | ৱ Assamese Wa |
|`U+09F2` | Symbol | SYMBOL | _null_ | ৲ Rupee Mark |
|`U+09F3` | Symbol | SYMBOL | _null_ | ৳ Rupee Sign |
|`U+09F4` | Number | NUMBER | _null_ | ৴ Numerator One |
|`U+09F5` | Number | NUMBER | _null_ | ৵ Numerator Two |
|`U+09F6` | Number | NUMBER | _null_ | ৶ Numerator Three |
|`U+09F7` | Number | NUMBER | _null_ | ৷ Numerator Four |
|`U+09F8` | Number | NUMBER | _null_ | ৸ Numerator One Less Than Denominator |
|`U+09F9` | Number | NUMBER | _null_ | ৹ Denominator Sixteen |
|`U+09FA` | Symbol | SYMBOL | _null_ | ৺ Isshar |
|`U+09FB` | Symbol | SYMBOL | _null_ | ৻ Ganda Mark |
|`U+09FC` | Letter | _null_ | _null_ | ৼ Vedic Anusvara |
|`U+09FD` | Punctuation | _null_ | _null_ | ৽ Abbreviation Sign |
|`U+09FE` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ৾ Sandhi Mark |
|`U+09FF` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Bengali script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Bengali text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Bengali text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Bengali text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-devanagari.md
================================================
# Devanagari character tables #
This document lists the per-character shaping information needed to
[shape Devanagari text](../opentype-shaping-devanagari.md).
**Contents**
- [Devanagari character table](#devanagari-character-table)
- [Devanagari Extended character table](#devanagari-extended-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Devanagari character table ##
Devanagari glyphs should be classified as in the following
table. Codepoints in the Devanagari block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Devanagari character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0900` | Mark [Mn] | BINDU | TOP_POSITION | ऀ Inverted Candrabindu|
|`U+0901` | Mark [Mn] | BINDU | TOP_POSITION | ँ Candrabindu |
|`U+0902` | Mark [Mn] | BINDU | TOP_POSITION | ं Anusvara |
|`U+0903` | Mark [Mc] | VISARGA | RIGHT_POSITION | ः Visarga |
|`U+0904` | Letter | VOWEL_INDEPENDENT | _null_ | ऄ Short A |
|`U+0905` | Letter | VOWEL_INDEPENDENT | _null_ | अ A |
|`U+0906` | Letter | VOWEL_INDEPENDENT | _null_ | आ Aa |
|`U+0907` | Letter | VOWEL_INDEPENDENT | _null_ | इ I |
|`U+0908` | Letter | VOWEL_INDEPENDENT | _null_ | ई Ii |
|`U+0909` | Letter | VOWEL_INDEPENDENT | _null_ | उ U |
|`U+090A` | Letter | VOWEL_INDEPENDENT | _null_ | ऊ Uu |
|`U+090B` | Letter | VOWEL_INDEPENDENT | _null_ | ऋ Vocalic R |
|`U+090C` | Letter | VOWEL_INDEPENDENT | _null_ | ऌ Vocalic L |
|`U+090D` | Letter | VOWEL_INDEPENDENT | _null_ | ऍ Candra E |
|`U+090E` | Letter | VOWEL_INDEPENDENT | _null_ | ऎ Short E |
|`U+090F` | Letter | VOWEL_INDEPENDENT | _null_ | ए E |
| | | | |
|`U+0910` | Letter | VOWEL_INDEPENDENT | _null_ | ऐ Ai |
|`U+0911` | Letter | VOWEL_INDEPENDENT | _null_ | ऑ Candra O |
|`U+0912` | Letter | VOWEL_INDEPENDENT | _null_ | ऒ Short O |
|`U+0913` | Letter | VOWEL_INDEPENDENT | _null_ | ओ O |
|`U+0914` | Letter | VOWEL_INDEPENDENT | _null_ | औ Au |
|`U+0915` | Letter | CONSONANT | _null_ | क Ka |
|`U+0916` | Letter | CONSONANT | _null_ | ख Kha |
|`U+0917` | Letter | CONSONANT | _null_ | ग Ga |
|`U+0918` | Letter | CONSONANT | _null_ | घ Gha |
|`U+0919` | Letter | CONSONANT | _null_ | ङ Nga |
|`U+091A` | Letter | CONSONANT | _null_ | च Ca |
|`U+091B` | Letter | CONSONANT | _null_ | छ Cha |
|`U+091C` | Letter | CONSONANT | _null_ | ज Ja |
|`U+091D` | Letter | CONSONANT | _null_ | झ Jha |
|`U+091E` | Letter | CONSONANT | _null_ | ञ Nya |
|`U+091F` | Letter | CONSONANT | _null_ | ट Tta |
| | | | |
|`U+0920` | Letter | CONSONANT | _null_ | ठ Ttha |
|`U+0921` | Letter | CONSONANT | _null_ | ड Dda |
|`U+0922` | Letter | CONSONANT | _null_ | ढ Ddha |
|`U+0923` | Letter | CONSONANT | _null_ | ण Nna |
|`U+0924` | Letter | CONSONANT | _null_ | त Ta |
|`U+0925` | Letter | CONSONANT | _null_ | थ Tha |
|`U+0926` | Letter | CONSONANT | _null_ | द Da |
|`U+0927` | Letter | CONSONANT | _null_ | ध Dha |
|`U+0928` | Letter | CONSONANT | _null_ | न Na |
|`U+0929` | Letter | CONSONANT | _null_ | ऩ Nnna |
|`U+092A` | Letter | CONSONANT | _null_ | प Pa |
|`U+092B` | Letter | CONSONANT | _null_ | फ Pha |
|`U+092C` | Letter | CONSONANT | _null_ | ब Ba |
|`U+092D` | Letter | CONSONANT | _null_ | भ Bha |
|`U+092E` | Letter | CONSONANT | _null_ | म Ma |
|`U+092F` | Letter | CONSONANT | _null_ | य Ya |
| | | | |
|`U+0930` | Letter | CONSONANT | _null_ | र Ra |
|`U+0931` | Letter | CONSONANT | _null_ | ऱ Rra |
|`U+0932` | Letter | CONSONANT | _null_ | ल La |
|`U+0933` | Letter | CONSONANT | _null_ | ळ Lla |
|`U+0934` | Letter | CONSONANT | _null_ | ऴ Llla |
|`U+0935` | Letter | CONSONANT | _null_ | व Va |
|`U+0936` | Letter | CONSONANT | _null_ | श Sha |
|`U+0937` | Letter | CONSONANT | _null_ | ष Ssa |
|`U+0938` | Letter | CONSONANT | _null_ | स Sa |
|`U+0939` | Letter | CONSONANT | _null_ | ह Ha |
|`U+093A` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ऺ Sign Oe |
|`U+093B` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ऻ Sign Ooe |
|`U+093C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ़ Nukta |
|`U+093D` | Letter | AVAGRAHA | _null_ | ऽ Avagraha |
|`U+093E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ा Sign Aa |
|`U+093F` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ि Sign I |
| | | | |
|`U+0940` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ी Sign Ii |
|`U+0941` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ु Sign U |
|`U+0942` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ू Sign Uu |
|`U+0943` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ृ Sign Vocalic R |
|`U+0944` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ॄ Sign Vocalic Rr |
|`U+0945` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ॅ Sign Candra E |
|`U+0946` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ॆ Sign Short E |
|`U+0947` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | े Sign E |
|`U+0948` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ै Sign Ai |
|`U+0949` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ॉ Sign Candra O |
|`U+094A` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ॊ Sign Short O |
|`U+094B` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ो Sign O |
|`U+094C` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ौ Sign Au |
|`U+094D` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ् Virama |
|`U+094E` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ॎ Sign Prishthamatra E|
|`U+094F` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ॏ Sign Aw |
| | | | |
|`U+0950` | Mark [Mc] | _null_ | _null_ | ॐ Om |
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0953` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ॓ Grave accent |
|`U+0954` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ॔ Acute accent |
|`U+0955` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ॕ Sign Candra Long E |
|`U+0956` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ॖ Sign Ue |
|`U+0957` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ॗ Sign Uue |
|`U+0958` | Letter | CONSONANT | _null_ | क़ Qa |
|`U+0959` | Letter | CONSONANT | _null_ | ख़ Khha |
|`U+095A` | Letter | CONSONANT | _null_ | ग़ Ghha |
|`U+095B` | Letter | CONSONANT | _null_ | ज़ Za |
|`U+095C` | Letter | CONSONANT | _null_ | ड़ Dddha |
|`U+095D` | Letter | CONSONANT | _null_ | ढ़ Rha |
|`U+095E` | Letter | CONSONANT | _null_ | फ़ Fa |
|`U+095F` | Letter | CONSONANT | _null_ | य़ Yya |
| | | | |
|`U+0960` | Letter | VOWEL_INDEPENDENT | _null_ | ॠ Vocalic Rr |
|`U+0961` | Letter | VOWEL_INDEPENDENT | _null_ | ॡ Vocalic Ll |
|`U+0962` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ॢ Sign Vocalic L |
|`U+0963` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ॣ Sign Vocalic Ll |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
|`U+0966` | Number | NUMBER | _null_ | ० Digit Zero |
|`U+0967` | Number | NUMBER | _null_ | १ Digit One |
|`U+0968` | Number | NUMBER | _null_ | २ Digit Two |
|`U+0969` | Number | NUMBER | _null_ | ३ Digit Three |
|`U+096A` | Number | NUMBER | _null_ | ४ Digit Four |
|`U+096B` | Number | NUMBER | _null_ | ५ Digit Five |
|`U+096C` | Number | NUMBER | _null_ | ६ Digit Six |
|`U+096D` | Number | NUMBER | _null_ | ७ Digit Seven |
|`U+096E` | Number | NUMBER | _null_ | ८ Digit Eight |
|`U+096F` | Number | NUMBER | _null_ | ९ Digit Nine |
| | | | |
|`U+0970` | Punctuation | _null_ | _null_ | ॰ Abbreviation Sign |
|`U+0971` | Punctuation | _null_ | _null_ | ॱ Sign High Spacing Dot|
|`U+0972` | Letter | VOWEL_INDEPENDENT | _null_ | ॲ Candra Aa |
|`U+0973` | Letter | VOWEL_INDEPENDENT | _null_ | ॳ Oe |
|`U+0974` | Letter | VOWEL_INDEPENDENT | _null_ | ॴ Ooe |
|`U+0975` | Letter | VOWEL_INDEPENDENT | _null_ | ॵ Aw |
|`U+0976` | Letter | VOWEL_INDEPENDENT | _null_ | ॶ Ue |
|`U+0977` | Letter | VOWEL_INDEPENDENT | _null_ | ॷ Uue |
|`U+0978` | Letter | CONSONANT | _null_ | ॸ Marwari Dda |
|`U+0979` | Letter | CONSONANT | _null_ | ॹ Zha |
|`U+097A` | Letter | CONSONANT | _null_ | ॺ Heavy Ya |
|`U+097B` | Letter | CONSONANT | _null_ | ॻ Gga |
|`U+097C` | Letter | CONSONANT | _null_ | ॼ Jja |
|`U+097D` | Letter | CONSONANT | _null_ | ॽ Glottal Stop |
|`U+097E` | Letter | CONSONANT | _null_ | ॾ Ddda |
|`U+097F` | Letter | CONSONANT | _null_ | ॿ Bba |
:::
## Devanagari Extended character table ##
> Note: the cantillation marks of the "combining consonant" variety in
> the Devanagari Extended block are _not_ considered consonants for
> shaping purposes (including syllable identification, the
> determination of the base consonant, or positioning "Reph").
:::{table} Devanagari Extended character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+A8E0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣠ Combining Zero |
|`U+A8E1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣡ Combining One |
|`U+A8E2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣢ Combining Two |
|`U+A8E3` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣣ Combining Three |
|`U+A8E4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣤ Combining Four |
|`U+A8E5` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣥ Combining Five |
|`U+A8E6` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣦ Combining Six |
|`U+A8E7` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣧ Combining Seven |
|`U+A8E8` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣨ Combining Eight |
|`U+A8E9` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣩ Combining Nine |
|`U+A8EA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣪ Combining A |
|`U+A8EB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣫ Combining U |
|`U+A8EC` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣬ Combining Ka |
|`U+A8ED` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣭ Combining Na |
|`U+A8EE` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣮ Combining Pa |
|`U+A8EF` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣯ Combining Ra |
| | | | |
|`U+A8F0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣰ Combining Vi |
|`U+A8F1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ꣱ Combining Avagraha |
|`U+A8F2` | Letter | SYMBOL | _null_ | ꣲ Spacing Candrabindu |
|`U+A8F3` | Letter | BINDU | _null_ | ꣳ Candrabindu Virama |
|`U+A8F4` | Letter | _null_ | _null_ | ꣴ Double Candrabindu Virama|
|`U+A8F5` | Letter | _null_ | _null_ | ꣵ Candrabindu Two |
|`U+A8F6` | Letter | _null_ | _null_ | ꣶ Candrabindu Three |
|`U+A8F7` | Letter | SYMBOL | _null_ | ꣷ Candrabindu Avagraha|
|`U+A8F8` | Punctuation | _null_ | _null_ | ꣸ Pushpika |
|`U+A8F9` | Punctuation | _null_ | _null_ | ꣹ Gap Filler |
|`U+A8FA` | Punctuation | _null_ | _null_ | ꣺ Caret |
|`U+A8FB` | Letter | _null_ | _null_ | ꣻ Headstroke |
|`U+A8FC` | Punctuation | _null_ | _null_ | ꣼ Siddham |
|`U+A8FD` | Letter | _null_ | _null_ | ꣽ Jain Om |
|`U+A8FE` | Letter | VOWEL_INDEPENDENT | _null_ | ꣾ Ay |
|`U+A8FF` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ꣿ Sign Ay |
| | | | |
:::
## Devanagari Extended-A character table ##
:::{table} Devanagari Extended-A character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:--------------|:------------------------|:----------------------------------------|
| `U+11B00` | Punctuation | _null_ | _null_ | 𑬀 Head Mark |
| `U+11B01` | Punctuation | _null_ | _null_ | 𑬁 Head Mark With Headstroke |
| `U+11B02` | Punctuation | _null_ | _null_ | 𑬂 Sign Bhale |
| `U+11B03` | Punctuation | _null_ | _null_ | 𑬃 Sign Bhale With Hook |
| `U+11B04` | Punctuation | _null_ | _null_ | 𑬄 Sign Extended Bhale |
| `U+11B05` | Punctuation | _null_ | _null_ | 𑬅 Sign Extended Bhale With Hook |
| `U+11B06` | Punctuation | _null_ | _null_ | 𑬆 Sign Western Five-like Bhale |
| `U+11B07` | Punctuation | _null_ | _null_ | 𑬇 Sign Western Nine-like Bhale |
| `U+11B08` | Punctuation | _null_ | _null_ | 𑬈 Sign Reversed Nine-like Bhale |
| `U+11B09` | Punctuation | _null_ | _null_ | 𑬉 Sign Mindu |
| `U+11B0A` | _unassigned_ | | | |
| `U+11B0B` | _unassigned_ | | | |
| `U+11B0C` | _unassigned_ | | | |
| `U+11B0D` | _unassigned_ | | | |
| `U+11B0E` | _unassigned_ | | | |
| `U+11B0F` | _unassigned_ | | | |
| | | | | |
| `U+11B10` | _unassigned_ | | | |
| `U+11B11` | _unassigned_ | | | |
| `U+11B12` | _unassigned_ | | | |
| `U+11B13` | _unassigned_ | | | |
| `U+11B14` | _unassigned_ | | | |
| `U+11B15` | _unassigned_ | | | |
| `U+11B16` | _unassigned_ | | | |
| `U+11B17` | _unassigned_ | | | |
| `U+11B18` | _unassigned_ | | | |
| `U+11B19` | _unassigned_ | | | |
| `U+11B1A` | _unassigned_ | | | |
| `U+11B1B` | _unassigned_ | | | |
| `U+11B1C` | _unassigned_ | | | |
| `U+11B1D` | _unassigned_ | | | |
| `U+11B1E` | _unassigned_ | | | |
| `U+11B1F` | _unassigned_ | | | |
| | | | | |
| `U+11B20` | _unassigned_ | | | |
| `U+11B21` | _unassigned_ | | | |
| `U+11B22` | _unassigned_ | | | |
| `U+11B23` | _unassigned_ | | | |
| `U+11B24` | _unassigned_ | | | |
| `U+11B25` | _unassigned_ | | | |
| `U+11B26` | _unassigned_ | | | |
| `U+11B27` | _unassigned_ | | | |
| `U+11B28` | _unassigned_ | | | |
| `U+11B29` | _unassigned_ | | | |
| `U+11B2A` | _unassigned_ | | | |
| `U+11B2B` | _unassigned_ | | | |
| `U+11B2C` | _unassigned_ | | | |
| `U+11B2D` | _unassigned_ | | | |
| `U+11B2E` | _unassigned_ | | | |
| `U+11B2F` | _unassigned_ | | | |
| | | | | |
| `U+11B30` | _unassigned_ | | | |
| `U+11B31` | _unassigned_ | | | |
| `U+11B32` | _unassigned_ | | | |
| `U+11B33` | _unassigned_ | | | |
| `U+11B34` | _unassigned_ | | | |
| `U+11B35` | _unassigned_ | | | |
| `U+11B36` | _unassigned_ | | | |
| `U+11B37` | _unassigned_ | | | |
| `U+11B38` | _unassigned_ | | | |
| `U+11B39` | _unassigned_ | | | |
| `U+11B3A` | _unassigned_ | | | |
| `U+11B3B` | _unassigned_ | | | |
| `U+11B3C` | _unassigned_ | | | |
| `U+11B3D` | _unassigned_ | | | |
| `U+11B3E` | _unassigned_ | | | |
| `U+11B3F` | _unassigned_ | | | |
| | | | | |
| `U+11B40` | _unassigned_ | | | |
| `U+11B41` | _unassigned_ | | | |
| `U+11B42` | _unassigned_ | | | |
| `U+11B43` | _unassigned_ | | | |
| `U+11B44` | _unassigned_ | | | |
| `U+11B45` | _unassigned_ | | | |
| `U+11B46` | _unassigned_ | | | |
| `U+11B47` | _unassigned_ | | | |
| `U+11B48` | _unassigned_ | | | |
| `U+11B49` | _unassigned_ | | | |
| `U+11B4A` | _unassigned_ | | | |
| `U+11B4B` | _unassigned_ | | | |
| `U+11B4C` | _unassigned_ | | | |
| `U+11B4D` | _unassigned_ | | | |
| `U+11B4E` | _unassigned_ | | | |
| `U+11B4F` | _unassigned_ | | | |
| | | | | |
| `U+11B50` | _unassigned_ | | | |
| `U+11B51` | _unassigned_ | | | |
| `U+11B52` | _unassigned_ | | | |
| `U+11B53` | _unassigned_ | | | |
| `U+11B54` | _unassigned_ | | | |
| `U+11B55` | _unassigned_ | | | |
| `U+11B56` | _unassigned_ | | | |
| `U+11B57` | _unassigned_ | | | |
| `U+11B58` | _unassigned_ | | | |
| `U+11B59` | _unassigned_ | | | |
| `U+11B5A` | _unassigned_ | | | |
| `U+11B5B` | _unassigned_ | | | |
| `U+11B5C` | _unassigned_ | | | |
| `U+11B5D` | _unassigned_ | | | |
| `U+11B5E` | _unassigned_ | | | |
| `U+11B5F` | _unassigned_ | | | |
| | | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Devanagari script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Devanagari text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-gujarati.md
================================================
# Gujarati character tables #
This document lists the per-character shaping information needed to
[shape Gujarati text](../opentype-shaping-gujarati.md).
**Contents**
- [Gujarati character table](#gujarati-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Gujarati character table ##
Gujarati glyphs should be classified as in the following
table. Codepoints in the Gujarati block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Gujarati character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0A80` | _unassigned_ | | | |
|`U+0A81` | Mark [Mn] | BINDU | TOP_POSITION | ઁ Candrabindu |
|`U+0A82` | Mark [Mn] | BINDU | TOP_POSITION | ં Anusvara |
|`U+0A83` | Mark [Mc] | VISARGA | RIGHT_POSITION | ઃ Visarga |
|`U+0A84` | _unassigned_ | | | |
|`U+0A85` | Letter | VOWEL_INDEPENDENT | _null_ | અ A |
|`U+0A86` | Letter | VOWEL_INDEPENDENT | _null_ | આ Aa |
|`U+0A87` | Letter | VOWEL_INDEPENDENT | _null_ | ઇ I |
|`U+0A88` | Letter | VOWEL_INDEPENDENT | _null_ | ઈ Ii |
|`U+0A89` | Letter | VOWEL_INDEPENDENT | _null_ | ઉ U |
|`U+0A8A` | Letter | VOWEL_INDEPENDENT | _null_ | ઊ Uu |
|`U+0A8B` | Letter | VOWEL_INDEPENDENT | _null_ | ઋ Vocalic R |
|`U+0A8C` | Letter | VOWEL_INDEPENDENT | _null_ | ઌ Vocalic L |
|`U+0A8D` | Letter | VOWEL_INDEPENDENT | _null_ | ઍ Candra E |
|`U+0A8E` | _unassigned_ | | | |
|`U+0A8F` | Letter | VOWEL_INDEPENDENT | _null_ | એ E |
| | | | |
|`U+0A90` | Letter | VOWEL_INDEPENDENT | _null_ | ઐ Ai |
|`U+0A91` | Letter | VOWEL_INDEPENDENT | _null_ | ઑ Candra O |
|`U+0A92` | _unassigned_ | | | |
|`U+0A93` | Letter | VOWEL_INDEPENDENT | _null_ | ઓ O |
|`U+0A94` | Letter | VOWEL_INDEPENDENT | _null_ | ઔ Au |
|`U+0A95` | Letter | CONSONANT | _null_ | ક Ka |
|`U+0A96` | Letter | CONSONANT | _null_ | ખ Kha |
|`U+0A97` | Letter | CONSONANT | _null_ | ગ Ga |
|`U+0A98` | Letter | CONSONANT | _null_ | ઘ Gha |
|`U+0A99` | Letter | CONSONANT | _null_ | ઙ Nga |
|`U+0A9A` | Letter | CONSONANT | _null_ | ચ Ca |
|`U+0A9B` | Letter | CONSONANT | _null_ | છ Cha |
|`U+0A9C` | Letter | CONSONANT | _null_ | જ Ja |
|`U+0A9D` | Letter | CONSONANT | _null_ | ઝ Jha |
|`U+0A9E` | Letter | CONSONANT | _null_ | ઞ Nya |
|`U+0A9F` | Letter | CONSONANT | _null_ | ટ Tta |
| | | | |
|`U+0AA0` | Letter | CONSONANT | _null_ | ઠ Ttha |
|`U+0AA1` | Letter | CONSONANT | _null_ | ડ Dda |
|`U+0AA2` | Letter | CONSONANT | _null_ | ઢ Ddha |
|`U+0AA3` | Letter | CONSONANT | _null_ | ણ Nna |
|`U+0AA4` | Letter | CONSONANT | _null_ | ત Ta |
|`U+0AA5` | Letter | CONSONANT | _null_ | થ Tha |
|`U+0AA6` | Letter | CONSONANT | _null_ | દ Da |
|`U+0AA7` | Letter | CONSONANT | _null_ | ધ Dha |
|`U+0AA8` | Letter | CONSONANT | _null_ | ન Na |
|`U+0AA9` | _unassigned_ | | | |
|`U+0AAA` | Letter | CONSONANT | _null_ | પ Pa |
|`U+0AAB` | Letter | CONSONANT | _null_ | ફ Pha |
|`U+0AAC` | Letter | CONSONANT | _null_ | બ Ba |
|`U+0AAD` | Letter | CONSONANT | _null_ | ભ Bha |
|`U+0AAE` | Letter | CONSONANT | _null_ | મ Ma |
|`U+0AAF` | Letter | CONSONANT | _null_ | ય Ya |
| | | | |
|`U+0AB0` | Letter | CONSONANT | _null_ | ર Ra |
|`U+0AB1` | _unassigned_ | | | |
|`U+0AB2` | Letter | CONSONANT | _null_ | લ La |
|`U+0AB3` | Letter | CONSONANT | _null_ | ળ Lla |
|`U+0AB4` | _unassigned_ | | | |
|`U+0AB5` | Letter | CONSONANT | _null_ | વ Va |
|`U+0AB6` | Letter | CONSONANT | _null_ | શ Sha |
|`U+0AB7` | Letter | CONSONANT | _null_ | ષ Ssa |
|`U+0AB8` | Letter | CONSONANT | _null_ | સ Sa |
|`U+0AB9` | Letter | CONSONANT | _null_ | હ Ha |
|`U+0ABA` | _unassigned_ | | | |
|`U+0ABB` | _unassigned_ | | | |
|`U+0ABC` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ઼ Nukta |
|`U+0ABD` | Letter | AVAGRAHA | _null_ | ઽ Avagraha |
|`U+0ABE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ા Sign Aa |
|`U+0ABF` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | િ Sign I |
| | | | |
|`U+0AC0` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ી Sign Ii |
|`U+0AC1` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ુ Sign U |
|`U+0AC2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૂ Sign Uu |
|`U+0AC3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૃ Sign Vocalic R |
|`U+0AC4` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૄ Sign Vocalic Rr |
|`U+0AC5` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ૅ Sign Candra E |
|`U+0AC6` | _unassigned_ | | | |
|`U+0AC7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ે Sign E |
|`U+0AC8` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ૈ Sign Ai |
|`U+0AC9` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ૉ Sign Candra O |
|`U+0ACA` | _unassigned_ | | | |
|`U+0ACB` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ો Sign O |
|`U+0ACC` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ૌ Sign Au |
|`U+0ACD` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ્ Virama |
|`U+0ACE` | _unassigned_ | | | |
|`U+0ACF` | _unassigned_ | | | |
| | | | |
|`U+0AD0` | Letter | _null_ | _null_ | ૐ Om |
|`U+0AD1` | _unassigned_ | | | |
|`U+0AD2` | _unassigned_ | | | |
|`U+0AD3` | _unassigned_ | | | |
|`U+0AD4` | _unassigned_ | | | |
|`U+0AD5` | _unassigned_ | | | |
|`U+0AD6` | _unassigned_ | | | |
|`U+0AD7` | _unassigned_ | | | |
|`U+0AD8` | _unassigned_ | | | |
|`U+0AD9` | _unassigned_ | | | |
|`U+0ADA` | _unassigned_ | | | |
|`U+0ADB` | _unassigned_ | | | |
|`U+0ADC` | _unassigned_ | | | |
|`U+0ADD` | _unassigned_ | | | |
|`U+0ADE` | _unassigned_ | | | |
|`U+0ADF` | _unassigned_ | | | |
| | | | |
|`U+0AE0` | Letter | VOWEL_INDEPENDENT | _null_ | ૠ Vocalic Rr |
|`U+0AE1` | Letter | VOWEL_INDEPENDENT | _null_ | ૡ Vocalic Ll |
|`U+0AE2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૢ Sign Vocalic L |
|`U+0AE3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ૣ Sign Vocalic Ll |
|`U+0AE4` | _unassigned_ | | | |
|`U+0AE5` | _unassigned_ | | | |
|`U+0AE6` | Number | NUMBER | _null_ | ૦ Digit Zero |
|`U+0AE7` | Number | NUMBER | _null_ | ૧ Digit One |
|`U+0AE8` | Number | NUMBER | _null_ | ૨ Digit Two |
|`U+0AE9` | Number | NUMBER | _null_ | ૩ Digit Three |
|`U+0AEA` | Number | NUMBER | _null_ | ૪ Digit Four |
|`U+0AEB` | Number | NUMBER | _null_ | ૫ Digit Five |
|`U+0AEC` | Number | NUMBER | _null_ | ૬ Digit Six |
|`U+0AED` | Number | NUMBER | _null_ | ૭ Digit Seven |
|`U+0AEE` | Number | NUMBER | _null_ | ૮ Digit Eight |
|`U+0AEF` | Number | NUMBER | _null_ | ૯ Digit Nine |
| | | | |
|`U+0AF0` | Symbol | SYMBOL | _null_ | ૰ Abbreviation |
|`U+0AF1` | Symbol | SYMBOL | _null_ | ૱ Rupee Sign |
|`U+0AF2` | _unassigned_ | | | |
|`U+0AF3` | _unassigned_ | | | |
|`U+0AF4` | _unassigned_ | | | |
|`U+0AF5` | _unassigned_ | | | |
|`U+0AF6` | _unassigned_ | | | |
|`U+0AF7` | _unassigned_ | | | |
|`U+0AF8` | _unassigned_ | | | |
|`U+0AF9` | Letter | CONSONANT | _null_ | ૹ Zha |
|`U+0AFA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ૺ Sukun |
|`U+0AFB` | Mark [Mn] | NUKTA | TOP_POSITION | ૻ Shadda |
|`U+0AFC` | Mark [Mn] | CANTILLATION | TOP_POSITION | ૼ Maddah |
|`U+0AFD` | Mark [Mn] | NUKTA | TOP_POSITION | ૽ Three-Dot Nukta Above|
|`U+0AFE` | Mark [Mn] | NUKTA | TOP_POSITION | ૾ Circle Nukta Above |
|`U+0AFF` | Mark [Mn] | NUKTA | TOP_POSITION | ૿ Two-Circle Nukta Above|
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Gujarati script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Gujarati text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Gujarati text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Gujarati text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-gurmukhi.md
================================================
# Gurmukhi character tables #
This document lists the per-character shaping information needed to
[shape Gurmukhi text](../opentype-shaping-gurmukhi.md).
**Contents**
- [Gurmukhi character table](#gurmukhi-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Gurmukhi character table ##
Gurmukhi glyphs should be classified as in the following
table. Codepoints in the Gurmukhi block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Gurmukhi character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0A00` | _unassigned_ | | | |
|`U+0A01` | Mark [Mn] | BINDU | TOP_POSITION | ਁ Adak Bindi |
|`U+0A02` | Mark [Mn] | BINDU | TOP_POSITION | ਂ Bindi |
|`U+0A03` | Mark [Mc] | VISARGA | RIGHT_POSITION | ਃ Visarga |
|`U+0A04` | _unassigned_ | | | |
|`U+0A05` | Letter | VOWEL_INDEPENDENT | _null_ | ਅ A |
|`U+0A06` | Letter | VOWEL_INDEPENDENT | _null_ | ਆ Aa |
|`U+0A07` | Letter | VOWEL_INDEPENDENT | _null_ | ਇ I |
|`U+0A08` | Letter | VOWEL_INDEPENDENT | _null_ | ਈ Ii |
|`U+0A09` | Letter | VOWEL_INDEPENDENT | _null_ | ਉ U |
|`U+0A0A` | Letter | VOWEL_INDEPENDENT | _null_ | ਊ Uu |
|`U+0A0B` | _unassigned_ | | | |
|`U+0A0C` | _unassigned_ | | | |
|`U+0A0D` | _unassigned_ | | | |
|`U+0A0E` | _unassigned_ | | | |
|`U+0A0F` | Letter | VOWEL_INDEPENDENT | _null_ | ਏ Ee |
| | | | |
|`U+0A10` | Letter | VOWEL_INDEPENDENT | _null_ | ਐ Ai |
|`U+0A11` | _unassigned_ | | | |
|`U+0A12` | _unassigned_ | | | |
|`U+0A13` | Letter | VOWEL_INDEPENDENT | _null_ | ਓ Oo |
|`U+0A14` | Letter | VOWEL_INDEPENDENT | _null_ | ਔ Au |
|`U+0A15` | Letter | CONSONANT | _null_ | ਕ Ka |
|`U+0A16` | Letter | CONSONANT | _null_ | ਖ Kha |
|`U+0A17` | Letter | CONSONANT | _null_ | ਗ Ga |
|`U+0A18` | Letter | CONSONANT | _null_ | ਘ Gha |
|`U+0A19` | Letter | CONSONANT | _null_ | ਙ Nga |
|`U+0A1A` | Letter | CONSONANT | _null_ | ਚ Ca |
|`U+0A1B` | Letter | CONSONANT | _null_ | ਛ Cha |
|`U+0A1C` | Letter | CONSONANT | _null_ | ਜ Ja |
|`U+0A1D` | Letter | CONSONANT | _null_ | ਝ Jha |
|`U+0A1E` | Letter | CONSONANT | _null_ | ਞ Nya |
|`U+0A1F` | Letter | CONSONANT | _null_ | ਟ Tta |
| | | | |
|`U+0A20` | Letter | CONSONANT | _null_ | ਠ Ttha |
|`U+0A21` | Letter | CONSONANT | _null_ | ਡ Dda |
|`U+0A22` | Letter | CONSONANT | _null_ | ਢ Ddha |
|`U+0A23` | Letter | CONSONANT | _null_ | ਣ Nna |
|`U+0A24` | Letter | CONSONANT | _null_ | ਤ Ta |
|`U+0A25` | Letter | CONSONANT | _null_ | ਥ Tha |
|`U+0A26` | Letter | CONSONANT | _null_ | ਦ Da |
|`U+0A27` | Letter | CONSONANT | _null_ | ਧ Dha |
|`U+0A28` | Letter | CONSONANT | _null_ | ਨ Na |
|`U+0A29` | _unassigned_ | | | |
|`U+0A2A` | Letter | CONSONANT | _null_ | ਪ Pa |
|`U+0A2B` | Letter | CONSONANT | _null_ | ਫ Pha |
|`U+0A2C` | Letter | CONSONANT | _null_ | ਬ Ba |
|`U+0A2D` | Letter | CONSONANT | _null_ | ਭ Bha |
|`U+0A2E` | Letter | CONSONANT | _null_ | ਮ Ma |
|`U+0A2F` | Letter | CONSONANT | _null_ | ਯ Ya |
| | | | |
|`U+0A30` | Letter | CONSONANT | _null_ | ਰ Ra |
|`U+0A31` | _unassigned_ | | | |
|`U+0A32` | Letter | CONSONANT | _null_ | ਲ La |
|`U+0A33` | Letter | CONSONANT | _null_ | ਲ਼ Lla |
|`U+0A34` | _unassigned_ | | | |
|`U+0A35` | Letter | CONSONANT | _null_ | ਵ Va |
|`U+0A36` | Letter | CONSONANT | _null_ | ਸ਼ Sha |
|`U+0A37` | _unassigned_ | | | |
|`U+0A38` | Letter | CONSONANT | _null_ | ਸ Sa |
|`U+0A39` | Letter | CONSONANT | _null_ | ਹ Ha |
|`U+0A3A` | _unassigned_ | | | |
|`U+0A3B` | _unassigned_ | | | |
|`U+0A3C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ਼ Nukta |
|`U+0A3D` | _unassigned_ | | | |
|`U+0A3E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ਾ Sign Aa |
|`U+0A3F` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ਿ Sign I |
| | | | |
|`U+0A40` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ੀ Sign Ii |
|`U+0A41` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ੁ Sign U |
|`U+0A42` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ੂ Sign Uu |
|`U+0A43` | _unassigned_ | | | |
|`U+0A44` | _unassigned_ | | | |
|`U+0A45` | _unassigned_ | | | |
|`U+0A46` | _unassigned_ | | | |
|`U+0A47` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ੇ Sign Ee |
|`U+0A48` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ੈ Sign Ai |
|`U+0A49` | _unassigned_ | | | |
|`U+0A4A` | _unassigned_ | | | |
|`U+0A4B` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ੋ Sign Oo |
|`U+0A4C` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ੌ Sign Au |
|`U+0A4D` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ੍ Virama |
|`U+0A4E` | _unassigned_ | | | |
|`U+0A4F` | _unassigned_ | | | |
| | | | |
|`U+0A50` | _unassigned_ | | | |
|`U+0A51` | Mark [Mn] | CANTILLATION | _null_ | ੑ Udaat |
|`U+0A52` | _unassigned_ | | | |
|`U+0A53` | _unassigned_ | | | |
|`U+0A54` | _unassigned_ | | | |
|`U+0A55` | _unassigned_ | | | |
|`U+0A56` | _unassigned_ | | | |
|`U+0A57` | _unassigned_ | | | |
|`U+0A58` | _unassigned_ | | | |
|`U+0A59` | Letter | CONSONANT | _null_ | ਖ਼ Khha |
|`U+0A5A` | Letter | CONSONANT | _null_ | ਗ਼ Ghha |
|`U+0A5B` | Letter | CONSONANT | _null_ | ਜ਼ Za |
|`U+0A5C` | Letter | CONSONANT | _null_ | ੜ Rra |
|`U+0A5D` | _unassigned_ | | | |
|`U+0A5E` | Letter | CONSONANT | _null_ | ਫ਼ Fa |
|`U+0A5F` | _unassigned_ | | | |
| | | | |
|`U+0A60` | _unassigned_ | | | |
|`U+0A61` | _unassigned_ | | | |
|`U+0A62` | _unassigned_ | | | |
|`U+0A63` | _unassigned_ | | | |
|`U+0A64` | _unassigned_ | | | |
|`U+0A65` | _unassigned_ | | | |
|`U+0A66` | Number | NUMBER | _null_ | ੦ Digit Zero |
|`U+0A67` | Number | NUMBER | _null_ | ੧ Digit One |
|`U+0A68` | Number | NUMBER | _null_ | ੨ Digit Two |
|`U+0A69` | Number | NUMBER | _null_ | ੩ Digit Three |
|`U+0A6A` | Number | NUMBER | _null_ | ੪ Digit Four |
|`U+0A6B` | Number | NUMBER | _null_ | ੫ Digit Five |
|`U+0A6C` | Number | NUMBER | _null_ | ੬ Digit Six |
|`U+0A6D` | Number | NUMBER | _null_ | ੭ Digit Seven |
|`U+0A6E` | Number | NUMBER | _null_ | ੮ Digit Eight |
|`U+0A6F` | Number | NUMBER | _null_ | ੯ Digit Nine |
| | | | |
|`U+0A70` | Mark [Mn] | BINDU | TOP_POSITION | ੰ Tippi |
|`U+0A71` | Mark [Mn] | GEMINATION_MARK | TOP_POSITION | ੱ Addak |
|`U+0A72` | Letter | CONSONANT | _null_ | ੲ Iri |
|`U+0A73` | Letter | CONSONANT | _null_ | ੳ Ura |
|`U+0A74` | Letter | _null_ | _null_ | ੴ Ek Onkar |
|`U+0A75` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ੵ Yakash |
|`U+0A76` | Punctuation | _null_ | _null_ | ੶ Abbreviation Sign |
|`U+0A77` | _unassigned_ | | | |
|`U+0A78` | _unassigned_ | | | |
|`U+0A79` | _unassigned_ | | | |
|`U+0A7A` | _unassigned_ | | | |
|`U+0A7B` | _unassigned_ | | | |
|`U+0A7C` | _unassigned_ | | | |
|`U+0A7D` | _unassigned_ | | | |
|`U+0A7E` | _unassigned_ | | | |
|`U+0A7F` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Gurmukhi script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Gurmukhi text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Gurmukhi text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Gurmukhi text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-hangul.md
================================================
# Hangul character tables #
This document lists the per-character shaping information needed to
[shape Hangul text](../opentype-shaping-hangul.md).
**Contents**
- [Hangul Jamo character table](#hangul-jamo-character-table)
- [Hangul Jamo Extended-A character table](#hangul-jamo-extended-a-character-table)
- [Hangul Jamo Extended-B character table](#hangul-jamo-extended-b-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
- [Hangul Syllables: summary](#hangul-syllables-character-table)
## Hangul Jamo character table ##
Hangul Jamo should be classified as in the following
table. Codepoints in the Hangul Jamo block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
The _Jamo type_ column indicates the syllable-component type of the
jamo. "L" for leading consonants (choseong), "V" for vowels
(jungseong), and "T" for trailing consonants (jongseong).
In addition, the filler codepoints `U+115F` (Choseong Filler) and `U+1160`
(Jungseong Filler) are classified as type "Lf" and "Vf", respectively.
The _Composing_ column indicates whether or not the jamo is capable of
being canonically composed into a syllable included in the Hangul
Syllables block. Jamo in the modern Korean alphabet are designated
`YES`, while fillers and archaic jamo from the old Korean alphabet are
designated `NO`.
:::{table} Hangul Jamo character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+1100` | Letter | L | YES | ᄀ Kiyeok |
|`U+1101` | Letter | L | YES | ᄁ Ssangkiyeok |
|`U+1102` | Letter | L | YES | ᄂ Nieun |
|`U+1103` | Letter | L | YES | ᄃ Tikeut |
|`U+1104` | Letter | L | YES | ᄄ Ssangtikeut |
|`U+1105` | Letter | L | YES | ᄅ Rieul |
|`U+1106` | Letter | L | YES | ᄆ Mieum |
|`U+1107` | Letter | L | YES | ᄇ Pieup |
|`U+1108` | Letter | L | YES | ᄈ Ssangpieup |
|`U+1109` | Letter | L | YES | ᄉ Sios |
|`U+110A` | Letter | L | YES | ᄊ Ssangsios |
|`U+110B` | Letter | L | YES | ᄋ Ieung |
|`U+110C` | Letter | L | YES | ᄌ Cieuc |
|`U+110D` | Letter | L | YES | ᄍ Ssangcieuc |
|`U+110E` | Letter | L | YES | ᄎ Chieuch |
|`U+110F` | Letter | L | YES | ᄏ Khieukh |
| | | | | |
|`U+1110` | Letter | L | YES | ᄐ Thieuth |
|`U+1111` | Letter | L | YES | ᄑ Phieuph |
|`U+1112` | Letter | L | YES | ᄒ Hieuh |
|`U+1113` | Letter | L | NO | ᄓ Nieun-Kiyeok |
|`U+1114` | Letter | L | NO | ᄔ Ssangnieun |
|`U+1115` | Letter | L | NO | ᄕ Nieun-Tikeut |
|`U+1116` | Letter | L | NO | ᄖ Nieun-Pieup |
|`U+1117` | Letter | L | NO | ᄗ Tikeut-Kiyeok |
|`U+1118` | Letter | L | NO | ᄘ Rieul-Nieun |
|`U+1119` | Letter | L | NO | ᄙ Ssangrieul |
|`U+111A` | Letter | L | NO | ᄚ Rieul-Hieuh |
|`U+111B` | Letter | L | NO | ᄛ Kapyeounrieul |
|`U+111C` | Letter | L | NO | ᄜ Mieum-Pieup |
|`U+111D` | Letter | L | NO | ᄝ Kapyeounmieum |
|`U+111E` | Letter | L | NO | ᄞ Pieup-Kiyeok |
|`U+111F` | Letter | L | NO | ᄟ Pieup-Nieun |
| | | | | |
|`U+1120` | Letter | L | NO | ᄠ Pieup-Tikeut |
|`U+1121` | Letter | L | NO | ᄡ Pieup-Sios |
|`U+1122` | Letter | L | NO | ᄢ Pieup-Sios-Kiyeok |
|`U+1123` | Letter | L | NO | ᄣ Pieup-Sios-Tikeut |
|`U+1124` | Letter | L | NO | ᄤ Pieup-Sios-Pieup |
|`U+1125` | Letter | L | NO | ᄥ Pieup-Ssangsios |
|`U+1126` | Letter | L | NO | ᄦ Pieup-Sios-Cieuc |
|`U+1127` | Letter | L | NO | ᄧ Pieup-Cieuc |
|`U+1128` | Letter | L | NO | ᄨ Pieup-Chieuch |
|`U+1129` | Letter | L | NO | ᄩ Pieup-Thieuth |
|`U+112A` | Letter | L | NO | ᄪ Pieup-Phieuph |
|`U+112B` | Letter | L | NO | ᄫ Kapyeounpieup |
|`U+112C` | Letter | L | NO | ᄬ Kapyeounssangpieup |
|`U+112D` | Letter | L | NO | ᄭ Sios-Kiyeok |
|`U+112E` | Letter | L | NO | ᄮ Sios-Nieun |
|`U+112F` | Letter | L | NO | ᄯ Sios-Tikeut |
| | | | | |
|`U+1130` | Letter | L | NO | ᄰ Sios-Rieul |
|`U+1131` | Letter | L | NO | ᄱ Sios-Mieum |
|`U+1132` | Letter | L | NO | ᄲ Sios-Pieup |
|`U+1133` | Letter | L | NO | ᄳ Sios-Pieup-Kiyeok |
|`U+1134` | Letter | L | NO | ᄴ Sios-Ssangsios |
|`U+1135` | Letter | L | NO | ᄵ Sios-Ieung |
|`U+1136` | Letter | L | NO | ᄶ Sios-Cieuc |
|`U+1137` | Letter | L | NO | ᄷ Sios-Chieuch |
|`U+1138` | Letter | L | NO | ᄸ Sios-Khieukh |
|`U+1139` | Letter | L | NO | ᄹ Sios-Thieuth |
|`U+113A` | Letter | L | NO | ᄺ Sios-Phieuph |
|`U+113B` | Letter | L | NO | ᄻ Sios-Hieuh |
|`U+113C` | Letter | L | NO | ᄼ Chitueumsios |
|`U+113D` | Letter | L | NO | ᄽ Chitueumssangsios |
|`U+113E` | Letter | L | NO | ᄾ Ceongchieumsios |
|`U+113F` | Letter | L | NO | ᄿ Ceongchieumssangsios |
| | | | | |
|`U+1140` | Letter | L | NO | ᅀ Pansios |
|`U+1141` | Letter | L | NO | ᅁ Ieung-Kiyeok |
|`U+1142` | Letter | L | NO | ᅂ Ieung-Tikeut |
|`U+1143` | Letter | L | NO | ᅃ Ieung-Mieum |
|`U+1144` | Letter | L | NO | ᅄ Ieung-Pieup |
|`U+1145` | Letter | L | NO | ᅅ Ieung-Sios |
|`U+1146` | Letter | L | NO | ᅆ Ieung-Pansios |
|`U+1147` | Letter | L | NO | ᅇ Ssangieung |
|`U+1148` | Letter | L | NO | ᅈ Ieung-Cieuc |
|`U+1149` | Letter | L | NO | ᅉ Ieung-Chieuch |
|`U+114A` | Letter | L | NO | ᅊ Ieung-Thieuth |
|`U+114B` | Letter | L | NO | ᅋ Ieung-Phieuph |
|`U+114C` | Letter | L | NO | ᅌ Yesieung |
|`U+114D` | Letter | L | NO | ᅍ Cieuc-Ieung |
|`U+114E` | Letter | L | NO | ᅎ Chitueumcieuc |
|`U+114F` | Letter | L | NO | ᅏ Chitueumssangcieuc |
| | | | | |
|`U+1150` | Letter | L | NO | ᅐ Ceongchieumcieuc |
|`U+1151` | Letter | L | NO | ᅑ Ceongchieumssangcieuc |
|`U+1152` | Letter | L | NO | ᅒ Chieuch-Khieukh |
|`U+1153` | Letter | L | NO | ᅓ Chieuch-Hieuh |
|`U+1154` | Letter | L | NO | ᅔ Chitueumchieuch |
|`U+1155` | Letter | L | NO | ᅕ Ceongchieumchieuch |
|`U+1156` | Letter | L | NO | ᅖ Phieuph-Pieup |
|`U+1157` | Letter | L | NO | ᅗ Kapyeounphieuph |
|`U+1158` | Letter | L | NO | ᅘ Ssanghieuh |
|`U+1159` | Letter | L | NO | ᅙ Yeorinhieuh |
|`U+115A` | Letter | L | NO | ᅚ Kiyeok-Tikeut |
|`U+115B` | Letter | L | NO | ᅛ Nieun-Sios |
|`U+115C` | Letter | L | NO | ᅜ Nieun-Cieuc |
|`U+115D` | Letter | L | NO | ᅝ Nieun-Hieuh |
|`U+115E` | Letter | L | NO | ᅞ Tikeut-Rieul |
|`U+115F` | Letter | Lf | NO | ᅟ Choseong Filler |
| | | | | |
|`U+1160` | Letter | Vf | NO | ᅠ Jungseong Filler |
|`U+1161` | Letter | V | YES | ᅡ A |
|`U+1162` | Letter | V | YES | ᅢ Ae |
|`U+1163` | Letter | V | YES | ᅣ Ya |
|`U+1164` | Letter | V | YES | ᅤ Yae |
|`U+1165` | Letter | V | YES | ᅥ Eo |
|`U+1166` | Letter | V | YES | ᅦ E |
|`U+1167` | Letter | V | YES | ᅧ Yeo |
|`U+1168` | Letter | V | YES | ᅨ Ye |
|`U+1169` | Letter | V | YES | ᅩ O |
|`U+116A` | Letter | V | YES | ᅪ Wa |
|`U+116B` | Letter | V | YES | ᅫ Wae |
|`U+116C` | Letter | V | YES | ᅬ Oe |
|`U+116D` | Letter | V | YES | ᅭ Yo |
|`U+116E` | Letter | V | YES | ᅮ U |
|`U+116F` | Letter | V | YES | ᅯ Weo |
| | | | | |
|`U+1170` | Letter | V | YES | ᅰ We |
|`U+1171` | Letter | V | YES | ᅱ Wi |
|`U+1172` | Letter | V | YES | ᅲ Yu |
|`U+1173` | Letter | V | YES | ᅳ Eu |
|`U+1174` | Letter | V | YES | ᅴ Yi |
|`U+1175` | Letter | V | YES | ᅵ I |
|`U+1176` | Letter | V | NO | ᅶ A-O |
|`U+1177` | Letter | V | NO | ᅷ A-U |
|`U+1178` | Letter | V | NO | ᅸ Ya-O |
|`U+1179` | Letter | V | NO | ᅹ Ya-Yo |
|`U+117A` | Letter | V | NO | ᅺ Eo-O |
|`U+117B` | Letter | V | NO | ᅻ Eo-U |
|`U+117C` | Letter | V | NO | ᅼ Eo-Eu |
|`U+117D` | Letter | V | NO | ᅽ Yeo-O |
|`U+117E` | Letter | V | NO | ᅾ Yeo-U |
|`U+117F` | Letter | V | NO | ᅿ O-Eo |
| | | | | |
|`U+1180` | Letter | V | NO | ᆀ O-E |
|`U+1181` | Letter | V | NO | ᆁ O-Ye |
|`U+1182` | Letter | V | NO | ᆂ O-O |
|`U+1183` | Letter | V | NO | ᆃ O-U |
|`U+1184` | Letter | V | NO | ᆄ Yo-Ya |
|`U+1185` | Letter | V | NO | ᆅ Yo-Yae |
|`U+1186` | Letter | V | NO | ᆆ Yo-Yeo |
|`U+1187` | Letter | V | NO | ᆇ Yo-O |
|`U+1188` | Letter | V | NO | ᆈ Yo-I |
|`U+1189` | Letter | V | NO | ᆉ U-A |
|`U+118A` | Letter | V | NO | ᆊ U-Ae |
|`U+118B` | Letter | V | NO | ᆋ U-Eo-Eu |
|`U+118C` | Letter | V | NO | ᆌ U-Ye |
|`U+118D` | Letter | V | NO | ᆍ U-U |
|`U+118E` | Letter | V | NO | ᆎ Yu-A |
|`U+118F` | Letter | V | NO | ᆏ Yu-Eo |
| | | | | |
|`U+1190` | Letter | V | NO | ᆐ Yu-E |
|`U+1191` | Letter | V | NO | ᆑ Yu-Yeo |
|`U+1192` | Letter | V | NO | ᆒ Yu-Ye |
|`U+1193` | Letter | V | NO | ᆓ Yu-U |
|`U+1194` | Letter | V | NO | ᆔ Yu-I |
|`U+1195` | Letter | V | NO | ᆕ Eu-U |
|`U+1196` | Letter | V | NO | ᆖ Eu-Eu |
|`U+1197` | Letter | V | NO | ᆗ Yi-U |
|`U+1198` | Letter | V | NO | ᆘ I-A |
|`U+1199` | Letter | V | NO | ᆙ I-Ya |
|`U+119A` | Letter | V | NO | ᆚ I-O |
|`U+119B` | Letter | V | NO | ᆛ I-U |
|`U+119C` | Letter | V | NO | ᆜ I-Eu |
|`U+119D` | Letter | V | NO | ᆝ I-Araea |
|`U+119E` | Letter | V | NO | ᆞ Araea |
|`U+119F` | Letter | V | NO | ᆟ Araea-Eo |
| | | | | |
|`U+11A0` | Letter | V | NO | ᆠ Araea-U |
|`U+11A1` | Letter | V | NO | ᆡ Araea-I |
|`U+11A2` | Letter | V | NO | ᆢ Ssangaraea |
|`U+11A3` | Letter | V | NO | ᆣ A-Eu |
|`U+11A4` | Letter | V | NO | ᆤ Ya-U |
|`U+11A5` | Letter | V | NO | ᆥ Yeo-Ya |
|`U+11A6` | Letter | V | NO | ᆦ O-Ya |
|`U+11A7` | Letter | V | NO | ᆧ O-Yae |
|`U+11A8` | Letter | T | YES | ᆨ Kiyeok |
|`U+11A9` | Letter | T | YES | ᆩ Ssangkiyeok |
|`U+11AA` | Letter | T | YES | ᆪ Kiyeok-Sios |
|`U+11AB` | Letter | T | YES | ᆫ Nieun |
|`U+11AC` | Letter | T | YES | ᆬ Nieun-Cieuc |
|`U+11AD` | Letter | T | YES | ᆭ Nieun-Hieuh |
|`U+11AE` | Letter | T | YES | ᆮ Tikeut |
|`U+11AF` | Letter | T | YES | ᆯ Rieul |
| | | | | |
|`U+11B0` | Letter | T | YES | ᆰ Rieul-Kiyeok |
|`U+11B1` | Letter | T | YES | ᆱ Rieul-Mieum |
|`U+11B2` | Letter | T | YES | ᆲ Rieul-Pieup |
|`U+11B3` | Letter | T | YES | ᆳ Rieul-Sios |
|`U+11B4` | Letter | T | YES | ᆴ Rieul-Thieuth |
|`U+11B5` | Letter | T | YES | ᆵ Rieul-Phieuph |
|`U+11B6` | Letter | T | YES | ᆶ Rieul-Hieuh |
|`U+11B7` | Letter | T | YES | ᆷ Mieum |
|`U+11B8` | Letter | T | YES | ᆸ Pieup |
|`U+11B9` | Letter | T | YES | ᆹ Pieup-Sios |
|`U+11BA` | Letter | T | YES | ᆺ Sios |
|`U+11BB` | Letter | T | YES | ᆻ Ssangsios |
|`U+11BC` | Letter | T | YES | ᆼ Ieung |
|`U+11BD` | Letter | T | YES | ᆽ Cieuc |
|`U+11BE` | Letter | T | YES | ᆾ Chieuch |
|`U+11BF` | Letter | T | YES | ᆿ Khieukh |
| | | | | |
|`U+11C0` | Letter | T | YES | ᇀ Thieuth |
|`U+11C1` | Letter | T | YES | ᇁ Phieuph |
|`U+11C2` | Letter | T | YES | ᇂ Hieuh |
|`U+11C3` | Letter | T | NO | ᇃ Kiyeok-Rieul |
|`U+11C4` | Letter | T | NO | ᇄ Kiyeok-Sios-Kiyeok |
|`U+11C5` | Letter | T | NO | ᇅ Nieun-Kiyeok |
|`U+11C6` | Letter | T | NO | ᇆ Nieun-Tikeut |
|`U+11C7` | Letter | T | NO | ᇇ Nieun-Sios |
|`U+11C8` | Letter | T | NO | ᇈ Nieun-Pansios |
|`U+11C9` | Letter | T | NO | ᇉ Nieun-Thieuth |
|`U+11CA` | Letter | T | NO | ᇊ Tikeut-Kiyeok |
|`U+11CB` | Letter | T | NO | ᇋ Tikeut-Rieul |
|`U+11CC` | Letter | T | NO | ᇌ Rieul-Kiyeok-Sios |
|`U+11CD` | Letter | T | NO | ᇍ Rieul-Nieun |
|`U+11CE` | Letter | T | NO | ᇎ Rieul-Tikeut |
|`U+11CF` | Letter | T | NO | ᇏ Rieul-Tikeut-Hieuh |
| | | | | |
|`U+11D0` | Letter | T | NO | ᇐ Ssangrieul |
|`U+11D1` | Letter | T | NO | ᇑ Rieul-Mieum-Kiyeok |
|`U+11D2` | Letter | T | NO | ᇒ Rieul-Mieum-Sios |
|`U+11D3` | Letter | T | NO | ᇓ Rieul-Pieup-Sios |
|`U+11D4` | Letter | T | NO | ᇔ Rieul-Pieup-Hieuh |
|`U+11D5` | Letter | T | NO | ᇕ Rieul-Kapyeounpieup |
|`U+11D6` | Letter | T | NO | ᇖ Rieul-Ssangsios |
|`U+11D7` | Letter | T | NO | ᇗ Rieul-Pansios |
|`U+11D8` | Letter | T | NO | ᇘ Rieul-Khieukh |
|`U+11D9` | Letter | T | NO | ᇙ Rieul-Yeorinhieuh |
|`U+11DA` | Letter | T | NO | ᇚ Mieum-Kiyeok |
|`U+11DB` | Letter | T | NO | ᇛ Mieum-Rieul |
|`U+11DC` | Letter | T | NO | ᇜ Mieum-Pieup |
|`U+11DD` | Letter | T | NO | ᇝ Mieum-Sios |
|`U+11DE` | Letter | T | NO | ᇞ Mieum-Ssangsios |
|`U+11DF` | Letter | T | NO | ᇟ Mieum-Pansios |
| | | | | |
|`U+11E0` | Letter | T | NO | ᇠ Mieum-Chieuch |
|`U+11E1` | Letter | T | NO | ᇡ Mieum-Hieuh |
|`U+11E2` | Letter | T | NO | ᇢ Kapyeounmieum |
|`U+11E3` | Letter | T | NO | ᇣ Pieup-Rieul |
|`U+11E4` | Letter | T | NO | ᇤ Pieup-Phieuph |
|`U+11E5` | Letter | T | NO | ᇥ Pieup-Hieuh |
|`U+11E6` | Letter | T | NO | ᇦ Kapyeounpieup |
|`U+11E7` | Letter | T | NO | ᇧ Sios-Kiyeok |
|`U+11E8` | Letter | T | NO | ᇨ Sios-Tikeut |
|`U+11E9` | Letter | T | NO | ᇩ Sios-Rieul |
|`U+11EA` | Letter | T | NO | ᇪ Sios-Pieup |
|`U+11EB` | Letter | T | NO | ᇫ Pansios |
|`U+11EC` | Letter | T | NO | ᇬ Ieung-Kiyeok |
|`U+11ED` | Letter | T | NO | ᇭ Ieung-Ssangkiyeok |
|`U+11EE` | Letter | T | NO | ᇮ Ssangieung |
|`U+11EF` | Letter | T | NO | ᇯ Ieung-Khieukh |
| | | | | |
|`U+11F0` | Letter | T | NO | ᇰ Yesieung |
|`U+11F1` | Letter | T | NO | ᇱ Yesieung-Sios |
|`U+11F2` | Letter | T | NO | ᇲ Yesieung-Pansios |
|`U+11F3` | Letter | T | NO | ᇳ Phieuph-Pieup |
|`U+11F4` | Letter | T | NO | ᇴ Kapyeounphieuph |
|`U+11F5` | Letter | T | NO | ᇵ Hieuh-Nieun |
|`U+11F6` | Letter | T | NO | ᇶ Hieuh-Rieul |
|`U+11F7` | Letter | T | NO | ᇷ Hieuh-Mieum |
|`U+11F8` | Letter | T | NO | ᇸ Hieuh-Pieup |
|`U+11F9` | Letter | T | NO | ᇹ Yeorinhieuh |
|`U+11FA` | Letter | T | NO | ᇺ Kiyeok-Nieun |
|`U+11FB` | Letter | T | NO | ᇻ Kiyeok-Pieup |
|`U+11FC` | Letter | T | NO | ᇼ Kiyeok-Chieuch |
|`U+11FD` | Letter | T | NO | ᇽ Kiyeok-Khieukh |
|`U+11FE` | Letter | T | NO | ᇾ Kiyeok-Hieuh |
|`U+11FF` | Letter | T | NO | ᇿ Ssangnieun |
:::
## Hangul Jamo Extended-A character table ##
Hangul Jamo should be classified as in the following
table. Codepoints in the Hangul Jamo Extended-A block with no assigned
meaning are designated as _unassigned_ in the _Unicode category_ column.
The _Jamo type_ column indicates the syllable-component type of the
jamo. All assigned codepoints in the Hangul Jamo Extended-A block are
classified as type "L" for leading consonants (choseong).
:::{table} Hangul Jamo Extended-A character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+A960` | Letter | L | NO | ꥠ Tikeut-Mieum |
|`U+A961` | Letter | L | NO | ꥡ Tikeut-Pieup |
|`U+A962` | Letter | L | NO | ꥢ Tikeut-Sios |
|`U+A963` | Letter | L | NO | ꥣ Tikeut-Cieuc |
|`U+A964` | Letter | L | NO | ꥤ Rieul-Kiyeok |
|`U+A965` | Letter | L | NO | ꥥ Rieul-Ssangkiyeok |
|`U+A966` | Letter | L | NO | ꥦ Rieul-Tikeut |
|`U+A967` | Letter | L | NO | ꥧ Rieul-Ssangtikeut |
|`U+A968` | Letter | L | NO | ꥨ Rieul-Mieum |
|`U+A969` | Letter | L | NO | ꥩ Rieul-Pieup |
|`U+A96A` | Letter | L | NO | ꥪ Rieul-Ssangpieup |
|`U+A96B` | Letter | L | NO | ꥫ Rieul-Kapyeounpieup |
|`U+A96C` | Letter | L | NO | ꥬ Rieul-Sios |
|`U+A96D` | Letter | L | NO | ꥭ Rieul-Cieuc |
|`U+A96E` | Letter | L | NO | ꥮ Rieul-Khieukh |
|`U+A96F` | Letter | L | NO | ꥯ Mieum-Kiyeok |
| | | | | |
|`U+A970` | Letter | L | NO | ꥰ Mieum-Tikeut |
|`U+A971` | Letter | L | NO | ꥱ Mieum-Sios |
|`U+A972` | Letter | L | NO | ꥲ Pieup-Sios-Thieuth |
|`U+A973` | Letter | L | NO | ꥳ Pieup-Khieukh |
|`U+A974` | Letter | L | NO | ꥴ Pieup-Hieuh |
|`U+A975` | Letter | L | NO | ꥵ Ssangsios-Pieup |
|`U+A976` | Letter | L | NO | ꥶ Ieung-Rieul |
|`U+A977` | Letter | L | NO | ꥷ Ieung-Hieuh |
|`U+A978` | Letter | L | NO | ꥸ Ssangcieuc-Hieuh |
|`U+A979` | Letter | L | NO | ꥹ Ssangthieuth |
|`U+A97A` | Letter | L | NO | ꥺ Phieuph-Hieuh |
|`U+A97B` | Letter | L | NO | ꥻ Hieuh-Sios |
|`U+A97C` | Letter | L | NO | ꥼ Ssangyeorinhieuh |
|`U+A97D` | _unassigned_ | | | |
|`U+A97E` | _unassigned_ | | | |
|`U+A97F` | _unassigned_ | | | |
:::
## Hangul Jamo Extended-B character table ##
Hangul Jamo should be classified as in the following
table. Codepoints in the Hangul Jamo Extended-B block with no assigned
meaning are designated as _unassigned_ in the _Unicode category_ column.
The _Jamo type_ column indicates the syllable-component type of the
jamo. "V" for vowels (jungseong) and "T" for trailing consonants (jongseong).
:::{table} Hangul Jamo Extended-B character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+D7B0` | Letter | V | NO | ힰ O-Yeo |
|`U+D7B1` | Letter | V | NO | ힱ O-O-I |
|`U+D7B2` | Letter | V | NO | ힲ Yo-A |
|`U+D7B3` | Letter | V | NO | ힳ Yo-Ae |
|`U+D7B4` | Letter | V | NO | ힴ Yo-Eo |
|`U+D7B5` | Letter | V | NO | ힵ U-Yeo |
|`U+D7B6` | Letter | V | NO | ힶ U-I-I |
|`U+D7B7` | Letter | V | NO | ힷ Yu-Ae |
|`U+D7B8` | Letter | V | NO | ힸ Yu-O |
|`U+D7B9` | Letter | V | NO | ힹ Eu-A |
|`U+D7BA` | Letter | V | NO | ힺ Eu-Eo |
|`U+D7BB` | Letter | V | NO | ힻ Eu-E |
|`U+D7BC` | Letter | V | NO | ힼ Eu-O |
|`U+D7BD` | Letter | V | NO | ힽ I-Ya-O |
|`U+D7BE` | Letter | V | NO | ힾ I-Yae |
|`U+D7BF` | Letter | V | NO | ힿ I-Yeo |
| | | | | |
|`U+D7C0` | Letter | V | NO | ퟀ I-Ye |
|`U+D7C1` | Letter | V | NO | ퟁ I-O-I |
|`U+D7C2` | Letter | V | NO | ퟂ I-Yo |
|`U+D7C3` | Letter | V | NO | ퟃ I-Yu |
|`U+D7C4` | Letter | V | NO | ퟄ I-I |
|`U+D7C5` | Letter | V | NO | ퟅ Araea-A |
|`U+D7C6` | Letter | V | NO | ퟆ Araea-E |
|`U+D7C7` | _unassigned_ | | | |
|`U+D7C8` | _unassigned_ | | | |
|`U+D7C9` | _unassigned_ | | | |
|`U+D7CA` | _unassigned_ | | | |
|`U+D7CB` | Letter | T | NO | ퟋ Nieun-Rieul |
|`U+D7CC` | Letter | T | NO | ퟌ Nieun-Chieuch |
|`U+D7CD` | Letter | T | NO | ퟍ Ssangtikeut |
|`U+D7CE` | Letter | T | NO | ퟎ Ssangtikeut-Pieup |
|`U+D7CF` | Letter | T | NO | ퟏ Tikeut-Pieup |
| | | | | |
|`U+D7D0` | Letter | T | NO | ퟐ Tikeut-Sios |
|`U+D7D1` | Letter | T | NO | ퟑ Tikeut-Sios-Kiyeok |
|`U+D7D2` | Letter | T | NO | ퟒ Tikeut-Cieuc |
|`U+D7D3` | Letter | T | NO | ퟓ Tikeut-Chieuch |
|`U+D7D4` | Letter | T | NO | ퟔ Tikeut-Thieuth |
|`U+D7D5` | Letter | T | NO | ퟕ Rieul-Ssangkiyeok |
|`U+D7D6` | Letter | T | NO | ퟖ Rieul-Kiyeok-Hieuh |
|`U+D7D7` | Letter | T | NO | ퟗ Ssangrieul-Khieukh |
|`U+D7D8` | Letter | T | NO | ퟘ Rieul-Mieum-Hieuh |
|`U+D7D9` | Letter | T | NO | ퟙ Rieul-Pieup-Tikeut |
|`U+D7DA` | Letter | T | NO | ퟚ Rieul-Pieup-Phieuph |
|`U+D7DB` | Letter | T | NO | ퟛ Rieul-Yesieung |
|`U+D7DC` | Letter | T | NO | ퟜ Rieul-Yeorinhieuh-Hieuh |
|`U+D7DD` | Letter | T | NO | ퟝ Kapyeounrieul |
|`U+D7DE` | Letter | T | NO | ퟞ Mieum-Nieun |
|`U+D7DF` | Letter | T | NO | ퟟ Mieum-Ssangnieun |
| | | | | |
|`U+D7E0` | Letter | T | NO | ퟠ Ssangmieum |
|`U+D7E1` | Letter | T | NO | ퟡ Mieum-Pieup-Sios |
|`U+D7E2` | Letter | T | NO | ퟢ Mieum-Cieuc |
|`U+D7E3` | Letter | T | NO | ퟣ Pieup-Tikeut |
|`U+D7E4` | Letter | T | NO | ퟤ Pieup-Rieul-Phieuph |
|`U+D7E5` | Letter | T | NO | ퟥ Pieup-Mieum |
|`U+D7E6` | Letter | T | NO | ퟦ Ssangpieup |
|`U+D7E7` | Letter | T | NO | ퟧ Pieup-Sios-Tikeut |
|`U+D7E8` | Letter | T | NO | ퟨ Pieup-Cieuc |
|`U+D7E9` | Letter | T | NO | ퟩ Pieup-Chieuch |
|`U+D7EA` | Letter | T | NO | ퟪ Sios-Mieum |
|`U+D7EB` | Letter | T | NO | ퟫ Sios-Kapyeounpieup |
|`U+D7EC` | Letter | T | NO | ퟬ Ssangsios-Kiyeok |
|`U+D7ED` | Letter | T | NO | ퟭ Ssangsios-Tikeut |
|`U+D7EE` | Letter | T | NO | ퟮ Sios-Pansios |
|`U+D7EF` | Letter | T | NO | ퟯ Sios-Cieuc |
| | | | | |
|`U+D7F0` | Letter | T | NO | ퟰ Sios-Chieuch |
|`U+D7F1` | Letter | T | NO | ퟱ Sios-Thieuth |
|`U+D7F2` | Letter | T | NO | ퟲ Sios-Hieuh |
|`U+D7F3` | Letter | T | NO | ퟳ Pansios-Pieup |
|`U+D7F4` | Letter | T | NO | ퟴ Pansios-Kapyeounpieup |
|`U+D7F5` | Letter | T | NO | ퟵ Yesieung-Mieum |
|`U+D7F6` | Letter | T | NO | ퟶ Yesieung-Hieuh |
|`U+D7F7` | Letter | T | NO | ퟷ Cieuc-Pieup |
|`U+D7F8` | Letter | T | NO | ퟸ Cieuc-Ssangpieup |
|`U+D7F9` | Letter | T | NO | ퟹ Ssangcieuc |
|`U+D7FA` | Letter | T | NO | ퟺ Phieuph-Sios |
|`U+D7FB` | Letter | T | NO | ퟻ Phieuph-Thieuth |
|`U+D7FC` | _unassigned_ | | | |
|`U+D7FD` | _unassigned_ | | | |
|`U+D7FE` | _unassigned_ | | | |
|`U+D7FF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Hangul text may use
punctuation marks from the CJK Symbols And Punctuation block.
Of particular note are the single-dot tone mark (single-dot bangjeom)
and double-dot tone mark (double-dot bangjeom), `U+302E` and
`U+302F`. These non-spacing marks are common in Old Korean.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+302E` | Mark [Mn] | _null_ | _null_ | 〮 Single Dot Tone Mark |
|`U+302F` | Mark [Mn] | _null_ | _null_ | 〯 Double Dot Tone Mark |
:::
Other important characters that may be encountered when shaping runs
of Hangul text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).
The dotted-circle placeholder is frequently used when displaying a
mark in isolation. Real-world text may also use other characters, such
as hyphens or dashes, in a similar placeholder fashion; shaping
engines should cope with this situation gracefully.
The zero-width space (`U+200B`) or word joiner (`U+2060`) may be used
between two jamo to prevent them from being conjoined into a
syllable. The zero-width space allows a line break to happen between
the jamo, while the word joiner prevents the jamo from being separated
by a line break.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+200B` | Separator | _null_ | _null_ | Zero-width space |
|`U+200C` | Other | _null_ | _null_ | Zero-width non-joiner |
|`U+200D` | Other | _null_ | _null_ | Zero-width joiner |
|`U+2060` | Other | _null_ | _null_ | Word joiner |
|`U+25CC` | Symbol | _null_ | _null_ | ◌ Dotted circle |
:::
## Hangul Syllables character table ##
The Hangul Syllables block is too large to include a full character
table in this document.
Each syllable codepoint is classified either as type `LV` or type `LVT`,
indicating whether or not the syllable includes a trailing consonant
(jongseong) at the end.
Syllable codepoints are sorted in Hangul alphabetic order, first by
leading consonant (choseong), followed by vowel (jungseong), followed
by trailing consonant (jongseong).
This enables the algorithmic composition and decomposition of combining
jamo sequences and syllable codepoints.
:::{table} Hangul Syllables character table
| Codepoint | Unicode category | Syllable type | Glyph |
|:----------|:-----------------|:--------------|:---------------------------------|
|`U+AC00` | Letter [Lo] | LV | 가 G-A |
| | | | |
|`U+D5CC` | Letter [Lo] | LVT | 헌 H-A-N |
:::
================================================
FILE: character-tables/character-tables-hebrew.md
================================================
# Hebrew character tables #
This document lists the per-character shaping information needed to
[shape Hebrew text](../opentype-shaping-hebrew.md).
**Contents**
Separate character tables are provided for the Hebrew block, the
Hebrew letters included in the Alphabetic Presentation Forms block,
and for other miscellaneous characters that are used in `` text
runs:
- [Hebrew character table](#hebrew-character-table)
- [Alphabetic Presentation Forms character table](#alphabetic-presentation-forms-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category. For marks, the table lists the codepoint's mark combining
class. The codepoint's Unicode name and an example glyph are also provided.
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
## Hebrew character table ##
:::{table} Hebrew character table
| Codepoint | Unicode category | Mark class | Glyph |
|:----------|:-----------------|:-----------|:-------------------------------------|
| `U+0590` | _unassigned_ | | |
| `U+0591` | Mark [Mn] | 220 | ֑ Accent Etnahta |
| `U+0592` | Mark [Mn] | 230 | ֒ Accent Segol |
| `U+0593` | Mark [Mn] | 230 | ֓ Accent Shalshelet |
| `U+0594` | Mark [Mn] | 230 | ֔ Accent Zaqef Qatan |
| `U+0595` | Mark [Mn] | 230 | ֕ Accent Zaqef Gadol |
| `U+0596` | Mark [Mn] | 220 | ֖ Accent Tipeha |
| `U+0597` | Mark [Mn] | 230 | ֗ Accent Revia |
| `U+0598` | Mark [Mn] | 230 | ֘ Accent Zarqa |
| `U+0599` | Mark [Mn] | 230 | ֙ Accent Pashta |
| `U+059A` | Mark [Mn] | 222 | ֚ Accent Yetiv |
| `U+059B` | Mark [Mn] | 220 | ֛ Accent Tevir |
| `U+059C` | Mark [Mn] | 230 | ֜ Accent Geresh |
| `U+059D` | Mark [Mn] | 230 | ֝ Accent Geresh Muqdam |
| `U+059E` | Mark [Mn] | 230 | ֞ Accent Gershayim |
| `U+059F` | Mark [Mn] | 230 | ֟ Accent Qarney Para |
| | | | |
| `U+05A0` | Mark [Mn] | 230 | ֠ Accent Telisha Gedola |
| `U+05A1` | Mark [Mn] | 230 | ֡ Accent Pazer |
| `U+05A2` | Mark [Mn] | 220 | ֢ Accent Atnah Hafukh |
| `U+05A3` | Mark [Mn] | 220 | ֣ Accent Munah |
| `U+05A4` | Mark [Mn] | 220 | ֤ Accent Mahapakh |
| `U+05A5` | Mark [Mn] | 220 | ֥ Accent Merkha |
| `U+05A6` | Mark [Mn] | 220 | ֦ Accent Merkha Kefula |
| `U+05A7` | Mark [Mn] | 220 | ֧ Accent Darga |
| `U+05A8` | Mark [Mn] | 230 | ֨ Accent Qadma |
| `U+05A9` | Mark [Mn] | 230 | ֩ Accent Telisha Qetana |
| `U+05AA` | Mark [Mn] | 220 | ֪ Accent Yerah Ben Yomo |
| `U+05AB` | Mark [Mn] | 230 | ֫ Accent Ole |
| `U+05AC` | Mark [Mn] | 230 | ֬ Accent Iluy |
| `U+05AD` | Mark [Mn] | 222 | ֭ Accent Dehi |
| `U+05AE` | Mark [Mn] | 228 | ֮ Accent Zinor |
| `U+05AF` | Mark [Mn] | 230 | ֯ Mark Masora Circle |
| | | | |
| `U+05B0` | Mark [Mn] | 10 | ְ Point Sheva |
| `U+05B1` | Mark [Mn] | 11 | ֱ Point Hataf Segol |
| `U+05B2` | Mark [Mn] | 12 | ֲ Point Hataf Patah |
| `U+05B3` | Mark [Mn] | 13 | ֳ Point Hataf Qamats |
| `U+05B4` | Mark [Mn] | 14 | ִ Point Hiriq |
| `U+05B5` | Mark [Mn] | 15 | ֵ Point Tsere |
| `U+05B6` | Mark [Mn] | 16 | ֶ Point Segol |
| `U+05B7` | Mark [Mn] | 17 | ַ Point Patah |
| `U+05B8` | Mark [Mn] | 18 | ָ Point Qamats |
| `U+05B9` | Mark [Mn] | 19 | ֹ Point Holam |
| `U+05BA` | Mark [Mn] | 19 | ֺ Point Holam Haser For Vav |
| `U+05BB` | Mark [Mn] | 20 | ֻ Point Qubuts |
| `U+05BC` | Mark [Mn] | 21 | ּ Point Dagesh Or Mapiq |
| `U+05BD` | Mark [Mn] | 22 | ֽ Point Meteg |
| `U+05BE` | Punctuation Dash | _0_ | ־ Punctuation Maqaf |
| `U+05BF` | Mark [Mn] | 23 | ֿ Point Rafe |
| | | | |
| `U+05C0` | Punctuation | _0_ | ׀ Punctuation Paseq |
| `U+05C1` | Mark [Mn] | 24 | ׁ Point Shin Dot |
| `U+05C2` | Mark [Mn] | 25 | ׂ Point Sin Dot |
| `U+05C3` | Punctuation | _0_ | ׃ Punctuation Sof Pasuq |
| `U+05C4` | Mark [Mn] | 230 | ׄ Mark Upper Dot |
| `U+05C5` | Mark [Mn] | 220 | ׅ Mark Lower Dot |
| `U+05C6` | Punctuation | _0_ | ׆ Punctuation Nun Hafuka |
| `U+05C7` | Mark [Mn] | 18 | ׇ Point Qamats Qatan |
| `U+05C8` | _unassigned_ | | |
| `U+05C9` | _unassigned_ | | |
| `U+05CA` | _unassigned_ | | |
| `U+05CB` | _unassigned_ | | |
| `U+05CC` | _unassigned_ | | |
| `U+05CD` | _unassigned_ | | |
| `U+05CE` | _unassigned_ | | |
| `U+05CF` | _unassigned_ | | |
| | | | |
| `U+05D0` | Letter | _0_ | א Alef |
| `U+05D1` | Letter | _0_ | ב Bet |
| `U+05D2` | Letter | _0_ | ג Gimel |
| `U+05D3` | Letter | _0_ | ד Dalet |
| `U+05D4` | Letter | _0_ | ה He |
| `U+05D5` | Letter | _0_ | ו Vav |
| `U+05D6` | Letter | _0_ | ז Zayin |
| `U+05D7` | Letter | _0_ | ח Het |
| `U+05D8` | Letter | _0_ | ט Tet |
| `U+05D9` | Letter | _0_ | י Yod |
| `U+05DA` | Letter | _0_ | ך Final Kaf |
| `U+05DB` | Letter | _0_ | כ Kaf |
| `U+05DC` | Letter | _0_ | ל Lamed |
| `U+05DD` | Letter | _0_ | ם Final Mem |
| `U+05DE` | Letter | _0_ | מ Mem |
| `U+05DF` | Letter | _0_ | ן Final Nun |
| | | | |
| `U+05E0` | Letter | _0_ | נ Nun |
| `U+05E1` | Letter | _0_ | ס Samekh |
| `U+05E2` | Letter | _0_ | ע Ayin |
| `U+05E3` | Letter | _0_ | ף Final Pe |
| `U+05E4` | Letter | _0_ | פ Pe |
| `U+05E5` | Letter | _0_ | ץ Final Tsadi |
| `U+05E6` | Letter | _0_ | צ Tsadi |
| `U+05E7` | Letter | _0_ | ק Qof |
| `U+05E8` | Letter | _0_ | ר Resh |
| `U+05E9` | Letter | _0_ | ש Shin |
| `U+05EA` | Letter | _0_ | ת Tav |
| `U+05EB` | _unassigned_ | | |
| `U+05EC` | _unassigned_ | | |
| `U+05ED` | _unassigned_ | | |
| `U+05EE` | _unassigned_ | | |
| `U+05EF` | Letter | _0_ | ׯ Yod Triangle |
| | | | |
| `U+05F0` | Letter | _0_ | װ Ligature Yiddish Double Vav |
| `U+05F1` | Letter | _0_ | ױ Ligature Yiddish Vav Yod |
| `U+05F2` | Letter | _0_ | ײ Ligature Yiddish Double Yod |
| `U+05F3` | Punctuation | _0_ | ׳ Punctuation Geresh |
| `U+05F4` | Punctuation | _0_ | ״ Punctuation Gershayim |
| `U+05F5` | _unassigned_ | | |
| `U+05F6` | _unassigned_ | | |
| `U+05F7` | _unassigned_ | | |
| `U+05F8` | _unassigned_ | | |
| `U+05F9` | _unassigned_ | | |
| `U+05FA` | _unassigned_ | | |
| `U+05FB` | _unassigned_ | | |
| `U+05FC` | _unassigned_ | | |
| `U+05FD` | _unassigned_ | | |
| `U+05FE` | _unassigned_ | | |
| `U+05FF` | _unassigned_ | | |
:::
## Alphabetic Presentation Forms character table ##
This chart includes only the Hebrew codepoints from the Alphabetic
Presentation Forms block in Unicode.
The _Composition_ column lists the codepoints from the Hebrew block
that compose into the listed Alphabetic Presentation Form. These
presentation form compositions are not covered by the standard Unicode
composition algorithm.
Entries with a _null_ in this column do not need to be composed by the
shaping engine.
:::{table} Alphabetic Presentation Forms character table
| Codepoint | Unicode category | Mark class | Composition | Glyph |
|:----------|:-----------------|:-----------|:----------------|:----------------------------------------|
| `U+FB1D` | Letter | _0_ |`U+05D9`,`U+05B4`| יִ Yod With Hiriq |
| `U+FB1E` | Mark [Mn] | 26 | _null_ | ﬞ Point Juedo-Spanish Varika |
| `U+FB1F` | Letter | _0_ |`U+05F2`,`U+05B7`| ײַ Ligature Yiddish Yod Yod Patah |
| | | | | |
| `U+FB20` | Letter | _0_ | _null_ | ﬠ Alternative Ayin |
| `U+FB21` | Letter | _0_ | _null_ | ﬡ Wide Alef |
| `U+FB22` | Letter | _0_ | _null_ | ﬢ Wide Dalet |
| `U+FB23` | Letter | _0_ | _null_ | ﬣ Wide He |
| `U+FB24` | Letter | _0_ | _null_ | ﬤ Wide Kaf |
| `U+FB25` | Letter | _0_ | _null_ | ﬥ Wide Lamed |
| `U+FB26` | Letter | _0_ | _null_ | ﬦ Wide Final Mem |
| `U+FB27` | Letter | _0_ | _null_ | ﬧ Wide Resh |
| `U+FB28` | Letter | _0_ | _null_ | ﬨ Wide Tav |
| `U+FB29` | Letter | _0_ | _null_ | ﬩ Alternative Plus Sign |
| `U+FB2A` | Letter | _0_ |`U+05E9`,`U+05C1`| שׁ Shin With Shin Dot |
| `U+FB2B` | Letter | _0_ |`U+05E9`,`U+05C2`| שׂ Shin With Sin Dot |
| `U+FB2C` | Letter | _0_ |`U+FB2A`,`U+05BC` OR `U+FB49`,`U+05C1`| שּׁ Shin With Dagesh And Shin Dot |
| `U+FB2D` | Letter | _0_ |`U+FB2B`,`U+05BC` OR `U+FB49`,`U+05C2`| שּׂ Shin With Dagesh And Sin Dot |
| `U+FB2E` | Letter | _0_ |`U+05D0`,`U+05B7`| אַ Alef With Patah |
| `U+FB2F` | Letter | _0_ |`U+05D0`,`U+05B8`| אָ Alef With Qamats |
| | | | | |
| `U+FB30` | Letter | _0_ |`U+05D0`,`U+05BC`| אּ Alef With Mapiq |
| `U+FB31` | Letter | _0_ |`U+05D1`,`U+05BC`| בּ Bet With Dagesh |
| `U+FB32` | Letter | _0_ |`U+05D2`,`U+05BC`| גּ Gimel With Dagesh |
| `U+FB33` | Letter | _0_ |`U+05D3`,`U+05BC`| דּ Dalet With Dagesh |
| `U+FB34` | Letter | _0_ |`U+05D4`,`U+05BC`| הּ He With Mapiq |
| `U+FB35` | Letter | _0_ |`U+05D5`,`U+05BC`| וּ Vav With Dagesh |
| `U+FB36` | Letter | _0_ |`U+05D6`,`U+05BC`| זּ Zayin With Dagesh |
| `U+FB37` | _unassigned_ | | | |
| `U+FB38` | Letter | _0_ |`U+05D8`,`U+05BC`| טּ Tet With Dagesh |
| `U+FB39` | Letter | _0_ |`U+05D9`,`U+05BC`| יּ Yod With Dagesh |
| `U+FB3A` | Letter | _0_ |`U+05DA`,`U+05BC`| ךּ Final Kaf With Dagesh |
| `U+FB3B` | Letter | _0_ |`U+05DB`,`U+05BC`| כּ Kaf With Dagesh |
| `U+FB3C` | Letter | _0_ |`U+05DC`,`U+05BC`| לּ Lamed With Dagesh |
| `U+FB3D` | _unassigned_ | | | |
| `U+FB3E` | Letter | _0_ |`U+05DE`,`U+05BC`| מּ Mem With Dagesh |
| `U+FB3F` | _unassigned_ | | | |
| | | | | |
| `U+FB40` | Letter | _0_ |`U+05E0`,`U+05BC`| נּ Nun With Dagesh |
| `U+FB41` | Letter | _0_ |`U+05E1`,`U+05BC`| סּ Samekh With Dagesh |
| `U+FB42` | _unassigned_ | | | |
| `U+FB43` | Letter | _0_ |`U+05E3`,`U+05BC`| ףּ Final Pe With Dagesh |
| `U+FB44` | Letter | _0_ |`U+05E4`,`U+05BC`| פּ Pe With Dagesh |
| `U+FB45` | _unassigned_ | | | |
| `U+FB46` | Letter | _0_ |`U+05E6`,`U+05BC`| צּ Tsadi With Dagesh |
| `U+FB47` | Letter | _0_ |`U+05E7`,`U+05BC`| קּ Qof With Dagesh |
| `U+FB48` | Letter | _0_ |`U+05E8`,`U+05BC`| רּ Resh With Dagesh |
| `U+FB49` | Letter | _0_ |`U+05E9`,`U+05BC`| שּ Shin With Dagesh |
| `U+FB4A` | Letter | _0_ |`U+05EA`,`U+05BC`| תּ Tav With Dagesh |
| `U+FB4B` | Letter | _0_ |`U+05D5`,`U+05B9`| וֹ Vav With Holam |
| `U+FB4C` | Letter | _0_ |`U+05D1`,`U+05BF`| בֿ Bet With Rafe |
| `U+FB4D` | Letter | _0_ |`U+05DB`,`U+05BF`| כֿ Kaf With Rafe |
| `U+FB4E` | Letter | _0_ |`U+05E4`,`U+05BF`| פֿ Pe With Rafe |
| `U+FB4F` | Letter | _0_ | _null_ | ﭏ Ligature Alef Lamed |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Hebrew text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).
The dotted-circle placeholder is frequently used when displaying a
mark in isolation. Real-world text may also use other characters, such
as hyphens or dashes, in a similar placeholder fashion; shaping
engines should cope with this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Mark class | Glyph |
|:----------|:-----------------|:-----------|:-----------------------------------|
|`U+00A0` | Separator | _0_ | No-break space |
|`U+034F` | Other | _0_ | ͏ Combining grapheme joiner |
|`U+200C` | Other | _0_ | Zero-width non-joiner |
|`U+200D` | Other | _0_ | Zero-width joiner |
|`U+200E` | Other | _0_ | Left-to-Right marker |
|`U+200F` | Other | _0_ | Right-to-Left marker |
|`U+25CC` | Symbol | _0_ | ◌ Dotted circle |
:::
================================================
FILE: character-tables/character-tables-kannada.md
================================================
# Kannada character tables #
This document lists the per-character shaping information needed to
[shape Kannada text](../opentype-shaping-kannada.md).
**Contents**
- [Kannada character table](#kannada-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Kannada character table ##
Kannada glyphs should be classified as in the following
table. Codepoints in the Kannada block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Kannada character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0C80` | Letter | PLACEHOLDER | _null_ | ಀ Spacing Candrabindu |
|`U+0C81` | Mark [Mn] | BINDU | TOP_POSITION | ಁ Candrabindu |
|`U+0C82` | Mark [Mc] | BINDU | RIGHT_POSITION | ಂ Anusvara |
|`U+0C83` | Mark [Mc] | VISARGA | RIGHT_POSITION | ಃ Visarga |
|`U+0C84` | Punctuation | _null_ | _null_ | ಄ Siddham |
|`U+0C85` | Letter | VOWEL_INDEPENDENT | _null_ | ಅ A |
|`U+0C86` | Letter | VOWEL_INDEPENDENT | _null_ | ಆ Aa |
|`U+0C87` | Letter | VOWEL_INDEPENDENT | _null_ | ಇ I |
|`U+0C88` | Letter | VOWEL_INDEPENDENT | _null_ | ಈ Ii |
|`U+0C89` | Letter | VOWEL_INDEPENDENT | _null_ | ಉ U |
|`U+0C8A` | Letter | VOWEL_INDEPENDENT | _null_ | ಊ Uu |
|`U+0C8B` | Letter | VOWEL_INDEPENDENT | _null_ | ಋ Vocalic R |
|`U+0C8C` | Letter | VOWEL_INDEPENDENT | _null_ | ಌ Vocalic L |
|`U+0C8D` | _unassigned_ | | | |
|`U+0C8E` | Letter | VOWEL_INDEPENDENT | _null_ | ಎ E |
|`U+0C8F` | Letter | VOWEL_INDEPENDENT | _null_ | ಏ Ee |
| | | | |
|`U+0C90` | Letter | VOWEL_INDEPENDENT | _null_ | ಐ Ai |
|`U+0C91` | _unassigned_ | | | |
|`U+0C92` | Letter | VOWEL_INDEPENDENT | _null_ | ಒ O |
|`U+0C93` | Letter | VOWEL_INDEPENDENT | _null_ | ಓ Oo |
|`U+0C94` | Letter | VOWEL_INDEPENDENT | _null_ | ಔ Au |
|`U+0C95` | Letter | CONSONANT | _null_ | ಕ Ka |
|`U+0C96` | Letter | CONSONANT | _null_ | ಖ Kha |
|`U+0C97` | Letter | CONSONANT | _null_ | ಗ Ga |
|`U+0C98` | Letter | CONSONANT | _null_ | ಘ Gha |
|`U+0C99` | Letter | CONSONANT | _null_ | ಙ Nga |
|`U+0C9A` | Letter | CONSONANT | _null_ | ಚ Ca |
|`U+0C9B` | Letter | CONSONANT | _null_ | ಛ Cha |
|`U+0C9C` | Letter | CONSONANT | _null_ | ಜ Ja |
|`U+0C9D` | Letter | CONSONANT | _null_ | ಝ Jha |
|`U+0C9E` | Letter | CONSONANT | _null_ | ಞ Nya |
|`U+0C9F` | Letter | CONSONANT | _null_ | ಟ Tta |
| | | | |
|`U+0CA0` | Letter | CONSONANT | _null_ | ಠ Ttha |
|`U+0CA1` | Letter | CONSONANT | _null_ | ಡ Dda |
|`U+0CA2` | Letter | CONSONANT | _null_ | ಢ Ddha |
|`U+0CA3` | Letter | CONSONANT | _null_ | ಣ Nna |
|`U+0CA4` | Letter | CONSONANT | _null_ | ತ Ta |
|`U+0CA5` | Letter | CONSONANT | _null_ | ಥ Tha |
|`U+0CA6` | Letter | CONSONANT | _null_ | ದ Da |
|`U+0CA7` | Letter | CONSONANT | _null_ | ಧ Dha |
|`U+0CA8` | Letter | CONSONANT | _null_ | ನ Na |
|`U+0CA9` | _unassigned_ | | | |
|`U+0CAA` | Letter | CONSONANT | _null_ | ಪ Pa |
|`U+0CAB` | Letter | CONSONANT | _null_ | ಫ Pha |
|`U+0CAC` | Letter | CONSONANT | _null_ | ಬ Ba |
|`U+0CAD` | Letter | CONSONANT | _null_ | ಭ Bha |
|`U+0CAE` | Letter | CONSONANT | _null_ | ಮ Ma |
|`U+0CAF` | Letter | CONSONANT | _null_ | ಯ Ya |
| | | | |
|`U+0CB0` | Letter | CONSONANT | _null_ | ರ Ra |
|`U+0CB1` | Letter | CONSONANT | _null_ | ಱ Rra |
|`U+0CB2` | Letter | CONSONANT | _null_ | ಲ La |
|`U+0CB3` | Letter | CONSONANT | _null_ | ಳ Lla |
|`U+0CB4` | _unassigned_ | | | |
|`U+0CB5` | Letter | CONSONANT | _null_ | ವ Va |
|`U+0CB6` | Letter | CONSONANT | _null_ | ಶ Sha |
|`U+0CB7` | Letter | CONSONANT | _null_ | ಷ Ssa |
|`U+0CB8` | Letter | CONSONANT | _null_ | ಸ Sa |
|`U+0CB9` | Letter | CONSONANT | _null_ | ಹ Ha |
|`U+0CBA` | _unassigned_ | | | |
|`U+0CBB` | _unassigned_ | | | |
|`U+0CBC` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ಼ Nukta |
|`U+0CBD` | Letter | AVAGRAHA | _null_ | ಽ Avagraha |
|`U+0CBE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ಾ Sign Aa |
|`U+0CBF` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ಿ Sign I |
| | | | |
|`U+0CC0` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ೀ Sign Ii |
|`U+0CC1` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ು Sign U |
|`U+0CC2` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ೂ Sign Uu |
|`U+0CC3` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ೃ Sign Vocalic R |
|`U+0CC4` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ೄ Sign Vocalic Rr |
|`U+0CC5` | _unassigned_ | | | |
|`U+0CC6` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ೆ Sign E |
|`U+0CC7` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ೇ Sign Ee |
|`U+0CC8` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ೈ Sign Ai |
|`U+0CC9` | _unassigned_ | | | |
|`U+0CCA` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ೊ Sign O |
|`U+0CCB` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ೋ Sign Oo |
|`U+0CCC` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ೌ Sign Au |
|`U+0CCD` | Mark [Mn] | VIRAMA | TOP_POSITION | ್ Virama |
|`U+0CCE` | _unassigned_ | | | |
|`U+0CCF` | _unassigned_ | | | |
| | | | |
|`U+0CD0` | _unassigned_ | | | |
|`U+0CD1` | _unassigned_ | | | |
|`U+0CD2` | _unassigned_ | | | |
|`U+0CD3` | _unassigned_ | | | |
|`U+0CD4` | _unassigned_ | | | |
|`U+0CD5` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ೕ Length Mark |
|`U+0CD6` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ೖ Ai Length Mark |
|`U+0CD7` | _unassigned_ | | | |
|`U+0CD8` | _unassigned_ | | | |
|`U+0CD9` | _unassigned_ | | | |
|`U+0CDA` | _unassigned_ | | | |
|`U+0CDB` | _unassigned_ | | | |
|`U+0CDC` | _unassigned_ | | | |
|`U+0CDD` | _unassigned_ | | | |
|`U+0CDE` | Letter | CONSONANT | _null_ | ೞ Fa |
|`U+0CDF` | _unassigned_ | | | |
| | | | |
|`U+0CE0` | Letter | VOWEL_INDEPENDENT | _null_ | ೠ Vocalic Rr |
|`U+0CE1` | Letter | VOWEL_INDEPENDENT | _null_ | ೡ Vocalic Ll |
|`U+0CE2` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ೢ Sign Vocalic L |
|`U+0CE3` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ೣ Sign Vocalic Ll |
|`U+0CE4` | _unassigned_ | | | |
|`U+0CE5` | _unassigned_ | | | |
|`U+0CE6` | Number | NUMBER | _null_ | ೦ Digit Zero |
|`U+0CE7` | Number | NUMBER | _null_ | ೧ Digit One |
|`U+0CE8` | Number | NUMBER | _null_ | ೨ Digit Two |
|`U+0CE9` | Number | NUMBER | _null_ | ೩ Digit Three |
|`U+0CEA` | Number | NUMBER | _null_ | ೪ Digit Four |
|`U+0CEB` | Number | NUMBER | _null_ | ೫ Digit Five |
|`U+0CEC` | Number | NUMBER | _null_ | ೬ Digit Six |
|`U+0CED` | Number | NUMBER | _null_ | ೭ Digit Seven |
|`U+0CEE` | Number | NUMBER | _null_ | ೮ Digit Eight |
|`U+0CEF` | Number | NUMBER | _null_ | ೯ Digit Nine |
| | | | |
|`U+0CF0` | _unassigned_ | | | |
|`U+0CF1` | Letter | CONSONANT_WITH_STACKER | _null_ | ೱ Jihvamuliya |
|`U+0CF2` | Letter | CONSONANT_WITH_STACKER | _null_ | ೲ Upadhmaniya |
|`U+0CF3` | Mark [Mc] | BINDU | RIGHT_POSITION | ೳ Combining Anusvara Above Right|
|`U+0CF4` | _unassigned_ | | | |
|`U+0CF5` | _unassigned_ | | | |
|`U+0CF6` | _unassigned_ | | | |
|`U+0CF7` | _unassigned_ | | | |
|`U+0CF8` | _unassigned_ | | | |
|`U+0CF9` | _unassigned_ | | | |
|`U+0CFA` | _unassigned_ | | | |
|`U+0CFB` | _unassigned_ | | | |
|`U+0CFC` | _unassigned_ | | | |
|`U+0CFD` | _unassigned_ | | | |
|`U+0CFE` | _unassigned_ | | | |
|`U+0CFF` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Kannada script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Kannada text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Kannada text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Kannada text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-khmer.md
================================================
# Khmer character tables #
This document lists the per-character shaping information needed to
[shape Khmer text](../opentype-shaping-khmer.md).
**Contents**
- [Khmer character table](#khmer-character-table)
- [Khmer Symbols character table](#khmer-symbols-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Khmer character table ##
Khmer glyphs should be classified as in the following
table. Codepoints in the Khmer block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Khmer character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1780` | Letter | CONSONANT | _null_ | ក Ka |
|`U+1781` | Letter | CONSONANT | _null_ | ខ Kha |
|`U+1782` | Letter | CONSONANT | _null_ | គ Ko |
|`U+1783` | Letter | CONSONANT | _null_ | ឃ Kho |
|`U+1784` | Letter | CONSONANT | _null_ | ង Ngo |
|`U+1785` | Letter | CONSONANT | _null_ | ច Ca |
|`U+1786` | Letter | CONSONANT | _null_ | ឆ Cha |
|`U+1787` | Letter | CONSONANT | _null_ | ជ Co |
|`U+1788` | Letter | CONSONANT | _null_ | ឈ Cho |
|`U+1789` | Letter | CONSONANT | _null_ | ញ Nyo |
|`U+178A` | Letter | CONSONANT | _null_ | ដ Da |
|`U+178B` | Letter | CONSONANT | _null_ | ឋ Ttha |
|`U+178C` | Letter | CONSONANT | _null_ | ឌ Do |
|`U+178D` | Letter | CONSONANT | _null_ | ឍ Ttho |
|`U+178E` | Letter | CONSONANT | _null_ | ណ Nno |
|`U+178F` | Letter | CONSONANT | _null_ | ត Ta |
| | | | |
|`U+1790` | Letter | CONSONANT | _null_ | ថ Tha |
|`U+1791` | Letter | CONSONANT | _null_ | ទ To |
|`U+1792` | Letter | CONSONANT | _null_ | ធ Tho |
|`U+1793` | Letter | CONSONANT | _null_ | ន No |
|`U+1794` | Letter | CONSONANT | _null_ | ប Ba |
|`U+1795` | Letter | CONSONANT | _null_ | ផ Pha |
|`U+1796` | Letter | CONSONANT | _null_ | ព Po |
|`U+1797` | Letter | CONSONANT | _null_ | ភ Pho |
|`U+1798` | Letter | CONSONANT | _null_ | ម Mo |
|`U+1799` | Letter | CONSONANT | _null_ | យ Yo |
|`U+179A` | Letter | CONSONANT | _null_ | រ Ro |
|`U+179B` | Letter | CONSONANT | _null_ | ល Lo |
|`U+179C` | Letter | CONSONANT | _null_ | វ Vo |
|`U+179D` | Letter | CONSONANT | _null_ | ឝ Sha |
|`U+179E` | Letter | CONSONANT | _null_ | ឞ Sso |
|`U+179F` | Letter | CONSONANT | _null_ | ស Sa |
| | | | |
|`U+17A0` | Letter | CONSONANT | _null_ | ហ Ha |
|`U+17A1` | Letter | CONSONANT | _null_ | ឡ La |
|`U+17A2` | Letter | CONSONANT | _null_ | អ Qa |
|`U+17A3` | Letter | VOWEL_INDEPENDENT | _null_ | ឣ Qaq |
|`U+17A4` | Letter | VOWEL_INDEPENDENT | _null_ | ឤ Qaa |
|`U+17A5` | Letter | VOWEL_INDEPENDENT | _null_ | ឥ Qi |
|`U+17A6` | Letter | VOWEL_INDEPENDENT | _null_ | ឦ Qii |
|`U+17A7` | Letter | VOWEL_INDEPENDENT | _null_ | ឧ Qu |
|`U+17A8` | Letter | VOWEL_INDEPENDENT | _null_ | ឨ Quk |
|`U+17A9` | Letter | VOWEL_INDEPENDENT | _null_ | ឩ Quu |
|`U+17AA` | Letter | VOWEL_INDEPENDENT | _null_ | ឪ Quuv |
|`U+17AB` | Letter | VOWEL_INDEPENDENT | _null_ | ឫ Ry |
|`U+17AC` | Letter | VOWEL_INDEPENDENT | _null_ | ឬ Ryy |
|`U+17AD` | Letter | VOWEL_INDEPENDENT | _null_ | ឭ Ly |
|`U+17AE` | Letter | VOWEL_INDEPENDENT | _null_ | ឮ Lyy |
|`U+17AF` | Letter | VOWEL_INDEPENDENT | _null_ | ឯ Qe |
| | | | |
|`U+17B0` | Letter | VOWEL_INDEPENDENT | _null_ | ឰ Qai |
|`U+17B1` | Letter | VOWEL_INDEPENDENT | _null_ | ឱ Qoo Type One |
|`U+17B2` | Letter | VOWEL_INDEPENDENT | _null_ | ឲ Qoo Type Two |
|`U+17B3` | Letter | VOWEL_INDEPENDENT | _null_ | ឳ Qau |
|`U+17B4` | Mark [Mn] | _null_ | _null_ | ឴ Inherent Aq |
|`U+17B5` | Mark [Mn] | _null_ | _null_ | ឵ Inherent Aa |
|`U+17B6` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ា Sign Aa |
|`U+17B7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ិ Sign I |
|`U+17B8` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ី Sign Ii |
|`U+17B9` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឹ Sign Y |
|`U+17BA` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ឺ Sign Yy |
|`U+17BB` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ុ Sign U |
|`U+17BC` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ូ Sign Uu |
|`U+17BD` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ួ Sign Ua |
|`U+17BE` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_LEFT_POSITION | ើ Sign Oe |
|`U+17BF` | Mark [Mc] | VOWEL_DEPENDENT | TOP_LEFT_AND_RIGHT_POSITION| ឿ Sign Ya |
| | | | |
|`U+17C0` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៀ Sign Ie |
|`U+17C1` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | េ Sign E |
|`U+17C2` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ែ Sign Ae |
|`U+17C3` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ៃ Sign Ai |
|`U+17C4` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ោ Sign Oo |
|`U+17C5` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ៅ Sign Au |
|`U+17C6` | Mark [Mn] | NUKTA | TOP_POSITION | ំ Nikahit |
|`U+17C7` | Mark [Mc] | VISARGA | RIGHT_POSITION | ះ Reahmuk |
|`U+17C8` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ៈ Yuukaleapintu |
|`U+17C9` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៉ Muusikatoan |
|`U+17CA` | Mark [Mn] | REGISTER_SHIFTER | TOP_POSITION | ៊ Triisap |
|`U+17CB` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ់ Bantoc |
|`U+17CC` | Mark [Mn] | CONSONANT_POST_REPHA| TOP_POSITION | ៌ Robat |
|`U+17CD` | Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | ៍ Toandakhiat |
|`U+17CE` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៎ Kakabat |
|`U+17CF` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៏ Ahsda |
| | | | |
|`U+17D0` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ័ Samyok Sannya |
|`U+17D1` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ៑ Viriam |
|`U+17D2` | Mark [Mn] | INVISIBLE_STACKER | _null_ | ្ Sign Coeng |
|`U+17D3` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៓ Bathamasat |
|`U+17D4` | Punctuation | _null_ | _null_ | ។ Khan |
|`U+17D5` | Punctuation | _null_ | _null_ | ៕ Bariyoosan |
|`U+17D6` | Punctuation | _null_ | _null_ | ៖ Camnuc Pii Kuuh |
|`U+17D7` | Letter | _null_ | _null_ | ៗ Lek Too |
|`U+17D8` | Punctuation | _null_ | _null_ | ៘ Beyyal |
|`U+17D9` | Punctuation | _null_ | _null_ | ៙ Phnaek Muan |
|`U+17DA` | Punctuation | _null_ | _null_ | ៚ Koomuut |
|`U+17DB` | Symbol | SYMBOL | _null_ | ៛ Riel |
|`U+17DC` | Letter | AVAGRAHA | _null_ | ៜ Avakrahasanya |
|`U+17DD` | Mark [Mn] | SYLLABLE_MODIFIER | TOP_POSITION | ៝ Atthacan |
|`U+17DE` | _unassigned_ | | | |
|`U+17DF` | _unassigned_ | | | |
| | | | |
|`U+17E0` | Number | NUMBER | _null_ | ០ Digit Zero |
|`U+17E1` | Number | NUMBER | _null_ | ១ Digit One |
|`U+17E2` | Number | NUMBER | _null_ | ២ Digit Two |
|`U+17E3` | Number | NUMBER | _null_ | ៣ Digit Three |
|`U+17E4` | Number | NUMBER | _null_ | ៤ Digit Four |
|`U+17E5` | Number | NUMBER | _null_ | ៥ Digit Five |
|`U+17E6` | Number | NUMBER | _null_ | ៦ Digit Six |
|`U+17E7` | Number | NUMBER | _null_ | ៧ Digit Seven |
|`U+17E8` | Number | NUMBER | _null_ | ៨ Digit Eight |
|`U+17E9` | Number | NUMBER | _null_ | ៩ Digit Nine |
|`U+17EA` | _unassigned_ | | | |
|`U+17EB` | _unassigned_ | | | |
|`U+17EC` | _unassigned_ | | | |
|`U+17ED` | _unassigned_ | | | |
|`U+17EE` | _unassigned_ | | | |
|`U+17EF` | _unassigned_ | | | |
| | | | |
|`U+17F0` | Number | _null_ | _null_ | ៰ Lek Attak Son |
|`U+17F1` | Number | _null_ | _null_ | ៱ Lek Attak Muoy |
|`U+17F2` | Number | _null_ | _null_ | ៲ Lek Attak Pii |
|`U+17F3` | Number | _null_ | _null_ | ៳ Lek Attak Bei |
|`U+17F4` | Number | _null_ | _null_ | ៴ Lek Attak Buon |
|`U+17F5` | Number | _null_ | _null_ | ៵ Lek Attak Pram |
|`U+17F6` | Number | _null_ | _null_ | ៶ Lek Attak Pram-Muoy |
|`U+17F7` | Number | _null_ | _null_ | ៷ Lek Attak Pram-Pii |
|`U+17F8` | Number | _null_ | _null_ | ៸ Lek Attak Pram-Bei |
|`U+17F9` | Number | _null_ | _null_ | ៹ Lek Attak Pram-Buon |
|`U+17FA` | _unassigned_ | | | |
|`U+17FB` | _unassigned_ | | | |
|`U+17FC` | _unassigned_ | | | |
|`U+17FD` | _unassigned_ | | | |
|`U+17FE` | _unassigned_ | | | |
|`U+17FF` | _unassigned_ | | | |
:::
## Khmer Symbols character table ##
The Khmer Symbols block contains miscellaneous symbols used for
lunar-date calendars. None evoke any special behavior from the shaping engine.
:::{table} Khmer Symbols character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+19E0` | Symbol | _null_ | _null_ | ᧠ Pathamasat |
|`U+19E1` | Symbol | _null_ | _null_ | ᧡ Muoy Koet |
|`U+19E2` | Symbol | _null_ | _null_ | ᧢ Pii Koet |
|`U+19E3` | Symbol | _null_ | _null_ | ᧣ Bei Koet |
|`U+19E4` | Symbol | _null_ | _null_ | ᧤ Buon Koet |
|`U+19E5` | Symbol | _null_ | _null_ | ᧥ Pram Koet |
|`U+19E6` | Symbol | _null_ | _null_ | ᧦ Pram-Muoy Koet |
|`U+19E7` | Symbol | _null_ | _null_ | ᧧ Pram-Pii Koet |
|`U+19E8` | Symbol | _null_ | _null_ | ᧨ Pram-Bei Koet |
|`U+19E9` | Symbol | _null_ | _null_ | ᧩ Pram-Buon Koet |
|`U+19EA` | Symbol | _null_ | _null_ | ᧪ Dap Koet |
|`U+19EB` | Symbol | _null_ | _null_ | ᧫ Dap-Muoy Koet |
|`U+19EC` | Symbol | _null_ | _null_ | ᧬ Dap-Pii Koet |
|`U+19ED` | Symbol | _null_ | _null_ | ᧭ Dap-Bei Koet |
|`U+19EE` | Symbol | _null_ | _null_ | ᧮ Dap-Buon Koet |
|`U+19EF` | Symbol | _null_ | _null_ | ᧯ Dap-Pram Koet |
| | | | |
|`U+19F0` | Symbol | _null_ | _null_ | ᧰ Tuteyasat |
|`U+19F1` | Symbol | _null_ | _null_ | ᧱ Muoy ROC |
|`U+19F2` | Symbol | _null_ | _null_ | ᧲ Pii Roc |
|`U+19F3` | Symbol | _null_ | _null_ | ᧳ Bei Roc |
|`U+19F4` | Symbol | _null_ | _null_ | ᧴ Buon Roc |
|`U+19F5` | Symbol | _null_ | _null_ | ᧵ Pram Roc |
|`U+19F6` | Symbol | _null_ | _null_ | ᧶ Pram-Muoy Roc |
|`U+19F7` | Symbol | _null_ | _null_ | ᧷ Pram-Pii Roc |
|`U+19F8` | Symbol | _null_ | _null_ | ᧸ Pram-Bei Roc |
|`U+19F9` | Symbol | _null_ | _null_ | ᧹ Pram-Buon Roc |
|`U+19FA` | Symbol | _null_ | _null_ | ᧺ Dap Roc |
|`U+19FB` | Symbol | _null_ | _null_ | ᧻ Dap-Muoy Roc |
|`U+19FC` | Symbol | _null_ | _null_ | ᧼ Dap-Pii Roc |
|`U+19FD` | Symbol | _null_ | _null_ | ᧽ Dap-Bei Roc |
|`U+19FE` | Symbol | _null_ | _null_ | ᧾ Dap-Buon Roc |
|`U+19FF` | Symbol | _null_ | _null_ | ᧿ Dap-Pram Roc |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Khmer text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a
conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The sequence
"_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a
conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence
"_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first
consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP<.abbr>) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or
"NBSP,_matra_".
In addition to general punctuation, runs of Khmer text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block.
================================================
FILE: character-tables/character-tables-lao.md
================================================
# Lao character tables #
This document lists the per-character shaping information needed to
[shape Lao text](../opentype-shaping-thai-lao.md#the-thailao-shaping-model).
**Contents**
- [Lao character table](#lao-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Lao character table ##
Lao glyphs should be classified as in the following
table. Codepoints in the Lao block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Lao character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Combining class | PUA | Glyph |
|:----------|:-----------------|:------------------|:------------------------|:----------------|:-------|:------------------------------|
|`U+0E80` | _unassigned_ | | | | | |
|`U+0E81` | Letter | CONSONANT | _null_ | _0_ | _null_ | ກ Ko |
|`U+0E82` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຂ Kho Sung |
|`U+0E83` | _unassigned_ | | | | | |
|`U+0E84` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຄ Kho Tam |
|`U+0E85` | _unassigned_ | | | | | |
|`U+0E86` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຆ Pali Gha |
|`U+0E87` | Letter | CONSONANT | _null_ | _0_ | _null_ | ງ Ngo |
|`U+0E88` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຈ Co |
|`U+0E89` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຉ Pali Cha |
|`U+0E8A` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຊ So Tam |
|`U+0E8B` | _unassigned_ | | | | | |
|`U+0E8C` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຌ Pali Jha |
|`U+0E8D` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຍ Nyo |
|`U+0E8E` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຎ Pali Nya |
|`U+0E8F` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຏ Pali Tta |
| | | | | | | |
|`U+0E90` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຐ Pali Ttha |
|`U+0E91` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຑ Pali Dda |
|`U+0E92` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຒ Pali Ddha |
|`U+0E93` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຓ Pali Nna |
|`U+0E94` | Letter | CONSONANT | _null_ | _0_ | _null_ | ດ Do |
|`U+0E95` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຕ To |
|`U+0E96` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຖ Tho Sung |
|`U+0E97` | Letter | CONSONANT | _null_ | _0_ | _null_ | ທ Tho Tam |
|`U+0E98` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຘ Pali Dha |
|`U+0E99` | Letter | CONSONANT | _null_ | _0_ | _null_ | ນ No |
|`U+0E9A` | Letter | CONSONANT | _null_ | _0_ | _null_ | ບ Bo |
|`U+0E9B` | Letter | CONSONANT | _null_ | _0_ | _null_ | ປ Po |
|`U+0E9C` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຜ Pho Sung |
|`U+0E9D` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຝ Fo Tam |
|`U+0E9E` | Letter | CONSONANT | _null_ | _0_ | _null_ | ພ Pho Tam |
|`U+0E9F` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຟ Fo Sung |
| | | | | | | |
|`U+0EA0` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຠ Pali Bha |
|`U+0EA1` | Letter | CONSONANT | _null_ | _0_ | _null_ | ມ Mo |
|`U+0EA2` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຢ Yo |
|`U+0EA3` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຣ Lo Ling |
|`U+0EA4` | _unassigned_ | | | | | |
|`U+0EA5` | Letter | CONSONANT | _null_ | _0_ | _null_ | ລ Lo Loot |
|`U+0EA6` | _unassigned_ | | | | | |
|`U+0EA7` | Letter | CONSONANT | _null_ | _0_ | _null_ | ວ Wo |
|`U+0EA8` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຨ Sanskrit Sha |
|`U+0EA9` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຩ Sanskrit Ssa |
|`U+0EAA` | Letter | CONSONANT | _null_ | _0_ | _null_ | ສ So Sung |
|`U+0EAB` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຫ Ho Sung |
|`U+0EAC` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຬ Pali Lla |
|`U+0EAD` | Letter | CONSONANT | _null_ | _0_ | _null_ | ອ O |
|`U+0EAE` | Letter | CONSONANT | _null_ | _0_ | _null_ | ຮ Ho Tam |
|`U+0EAF` | Letter | _null_ | _null_ | _0_ | _null_ | ຯ Ellipsis |
| | | | | | | |
|`U+0EB0` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | _null_ | ະ Sign A |
|`U+0EB1` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ັ Sign Mai Kan |
|`U+0EB2` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | _null_ | າ Sign Aa |
|`U+0EB3` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | _null_ | ຳ Sign Am |
|`U+0EB4` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ິ Sign I |
|`U+0EB5` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ີ Sign Ii |
|`U+0EB6` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ຶ Sign Y |
|`U+0EB7` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ື Sign Yy |
|`U+0EB8` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 118 | _null_ | ຸ Sign U |
|`U+0EB9` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 118 | _null_ | ູ Sign Uu |
|`U+0EBA` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | 9 | _null_ | ຺ Pali Virama |
|`U+0EBB` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | _null_ | ົ Sign Mai Kon |
|`U+0EBC` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | _0_ | _null_ | ຼ Semivowel Sign Lo |
|`U+0EBD` | Letter | CONSONANT_MEDIAL | _null_ | _0_ | _null_ | ຽ Semivowel Sign Nyo |
|`U+0EBE` | _unassigned_ | | | | | |
|`U+0EBF` | _unassigned_ | | | | | |
| | | | | | | |
|`U+0EC0` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | _null_ | ເ Sign E |
|`U+0EC1` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | _null_ | ແ Sign Ei |
|`U+0EC2` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | _null_ | ໂ Sign O |
|`U+0EC3` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | _null_ | ໃ Sign Ay |
|`U+0EC4` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | _null_ | ໄ Sign Ai |
|`U+0EC5` | _unassigned_ | | | | | |
|`U+0EC6` | Letter Modifier | _null_ | _null_ | _0_ | _null_ | ໆ Ko La |
|`U+0EC7` | _unassigned_ | | | | | |
|`U+0EC8` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | _null_ | ່ Tone Mai Ek |
|`U+0EC9` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | _null_ | ້ Tone Mai Tho |
|`U+0ECA` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | _null_ | ໊ Tone Mai Ti |
|`U+0ECB` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | _null_ | ໋ Tone Mai Catawa |
|`U+0ECC` | Mark [Mn] | _null_ | TOP_POSITION | _0_ | _null_ | ໌ Cancellation mark |
|`U+0ECD` | Mark [Mn] | BINDU | TOP_POSITION | _0_ | _null_ | ໍ Niggahita |
|`U+0ECE` | Mark [Mn] | TONE_MARKER | TOP_POSITION | _0_ | _null_ | ໎ Yamakkan |
|`U+0ECF` | _unassigned_ | | | | | |
| | | | | | | |
|`U+0ED0` | Number | NUMBER | _null_ | _0_ | _null_ | ໐ Digit Zero |
|`U+0ED1` | Number | NUMBER | _null_ | _0_ | _null_ | ໑ Digit One |
|`U+0ED2` | Number | NUMBER | _null_ | _0_ | _null_ | ໒ Digit Two |
|`U+0ED3` | Number | NUMBER | _null_ | _0_ | _null_ | ໓ Digit Three |
|`U+0ED4` | Number | NUMBER | _null_ | _0_ | _null_ | ໔ Digit Four |
|`U+0ED5` | Number | NUMBER | _null_ | _0_ | _null_ | ໕ Digit Five |
|`U+0ED6` | Number | NUMBER | _null_ | _0_ | _null_ | ໖ Digit Six |
|`U+0ED7` | Number | NUMBER | _null_ | _0_ | _null_ | ໗ Digit Seven |
|`U+0ED8` | Number | NUMBER | _null_ | _0_ | _null_ | ໘ Digit Eight |
|`U+0ED9` | Number | NUMBER | _null_ | _0_ | _null_ | ໙ Digit Nine |
|`U+0EDA` | _unassigned_ | | | | | |
|`U+0EDB` | _unassigned_ | | | | | |
|`U+0EDC` | Letter | CONSONANT | _null_ | _0_ | _null_ | ໜ Ho No |
|`U+0EDD` | Letter | CONSONANT | _null_ | _0_ | _null_ | ໝ Ho Mo |
|`U+0EDE` | Letter | CONSONANT | _null_ | _0_ | _null_ | ໞ Khmu Go |
|`U+0EDF` | Letter | CONSONANT | _null_ | _0_ | _null_ | ໟ Khmu Nyo |
| | | | | | | |
|`U+0EE0` | _unassigned_ | | | | | |
|`U+0EE1` | _unassigned_ | | | | | |
|`U+0EE2` | _unassigned_ | | | | | |
|`U+0EE3` | _unassigned_ | | | | | |
|`U+0EE4` | _unassigned_ | | | | | |
|`U+0EE5` | _unassigned_ | | | | | |
|`U+0EE6` | _unassigned_ | | | | | |
|`U+0EE7` | _unassigned_ | | | | | |
|`U+0EE8` | _unassigned_ | | | | | |
|`U+0EE9` | _unassigned_ | | | | | |
|`U+0EEA` | _unassigned_ | | | | | |
|`U+0EEB` | _unassigned_ | | | | | |
|`U+0EEC` | _unassigned_ | | | | | |
|`U+0EED` | _unassigned_ | | | | | |
|`U+0EEE` | _unassigned_ | | | | | |
|`U+0EEF` | _unassigned_ | | | | | |
| | | | | | | |
|`U+0EF0` | _unassigned_ | | | | | |
|`U+0EF1` | _unassigned_ | | | | | |
|`U+0EF2` | _unassigned_ | | | | | |
|`U+0EF3` | _unassigned_ | | | | | |
|`U+0EF4` | _unassigned_ | | | | | |
|`U+0EF5` | _unassigned_ | | | | | |
|`U+0EF6` | _unassigned_ | | | | | |
|`U+0EF7` | _unassigned_ | | | | | |
|`U+0EF8` | _unassigned_ | | | | | |
|`U+0EF9` | _unassigned_ | | | | | |
|`U+0EFA` | _unassigned_ | | | | | |
|`U+0EFB` | _unassigned_ | | | | | |
|`U+0EFC` | _unassigned_ | | | | | |
|`U+0EFD` | _unassigned_ | | | | | |
|`U+0EFE` | _unassigned_ | | | | | |
|`U+0EFF` | _unassigned_ | | | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Lao text text typically do not
insert spaces between words. Consequently, the Zero-Width Space (`U+200B`)
character is often used to insert invisible break points that may be
converted to line breaks.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+200B` | Separator | PLACEHOLDER | _null_ | Zero-width space |
:::
Other important characters that may be encountered when shaping runs
of Lao text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
================================================
FILE: character-tables/character-tables-malayalam.md
================================================
# Malayalam character tables #
This document lists the per-character shaping information needed to
[shape Malayalam text](../opentype-shaping-malayalam.md).
**Contents**
- [Malayalam character table](#malayalam-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Malayalam character table ##
Malayalam glyphs should be classified as in the following
table. Codepoints in the Malayalam block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Malayalam character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0D00` | Mark [Mn] | BINDU | TOP_POSITION | ഀ Combining Anusvara Above |
|`U+0D01` | Mark [Mn] | BINDU | TOP_POSITION | ഁ Candrabindu |
|`U+0D02` | Mark [Mc] | BINDU | RIGHT_POSITION | ം Anusvara |
|`U+0D03` | Mark [Mc] | VISARGA | RIGHT_POSITION | ഃ Visarga |
|`U+0D04` | Letter | BINDU | _null_ | ഄ Vedic Anusvara |
|`U+0D05` | Letter | VOWEL_INDEPENDENT | _null_ | അ A |
|`U+0D06` | Letter | VOWEL_INDEPENDENT | _null_ | ആ Aa |
|`U+0D07` | Letter | VOWEL_INDEPENDENT | _null_ | ഇ I |
|`U+0D08` | Letter | VOWEL_INDEPENDENT | _null_ | ഈ Ii |
|`U+0D09` | Letter | VOWEL_INDEPENDENT | _null_ | ഉ U |
|`U+0D0A` | Letter | VOWEL_INDEPENDENT | _null_ | ഊ Uu |
|`U+0D0B` | Letter | VOWEL_INDEPENDENT | _null_ | ഋ Vocalic R |
|`U+0D0C` | Letter | VOWEL_INDEPENDENT | _null_ | ഌ Vocalic L |
|`U+0D0D` | _unassigned_ | | | |
|`U+0D0E` | Letter | VOWEL_INDEPENDENT | _null_ | എ E |
|`U+0D0F` | Letter | VOWEL_INDEPENDENT | _null_ | ഏ Ee |
| | | | |
|`U+0D10` | Letter | VOWEL_INDEPENDENT | _null_ | ഐ Ai |
|`U+0D11` | _unassigned_ | | | |
|`U+0D12` | Letter | VOWEL_INDEPENDENT | _null_ | ഒ O |
|`U+0D13` | Letter | VOWEL_INDEPENDENT | _null_ | ഓ Oo |
|`U+0D14` | Letter | VOWEL_INDEPENDENT | _null_ | ഔ Au |
|`U+0D15` | Letter | CONSONANT | _null_ | ക Ka |
|`U+0D16` | Letter | CONSONANT | _null_ | ഖ Kha |
|`U+0D17` | Letter | CONSONANT | _null_ | ഗ Ga |
|`U+0D18` | Letter | CONSONANT | _null_ | ഘ Gha |
|`U+0D19` | Letter | CONSONANT | _null_ | ങ Nga |
|`U+0D1A` | Letter | CONSONANT | _null_ | ച Ca |
|`U+0D1B` | Letter | CONSONANT | _null_ | ഛ Cha |
|`U+0D1C` | Letter | CONSONANT | _null_ | ജ Ja |
|`U+0D1D` | Letter | CONSONANT | _null_ | ഝ Jha |
|`U+0D1E` | Letter | CONSONANT | _null_ | ഞ Nya |
|`U+0D1F` | Letter | CONSONANT | _null_ | ട Tta |
| | | | |
|`U+0D20` | Letter | CONSONANT | _null_ | ഠ Ttha |
|`U+0D21` | Letter | CONSONANT | _null_ | ഡ Dda |
|`U+0D22` | Letter | CONSONANT | _null_ | ഢ Ddha |
|`U+0D23` | Letter | CONSONANT | _null_ | ണ Nna |
|`U+0D24` | Letter | CONSONANT | _null_ | ത Ta |
|`U+0D25` | Letter | CONSONANT | _null_ | ഥ Tha |
|`U+0D26` | Letter | CONSONANT | _null_ | ദ Da |
|`U+0D27` | Letter | CONSONANT | _null_ | ധ Dha |
|`U+0D28` | Letter | CONSONANT | _null_ | ന Na |
|`U+0D29` | Letter | CONSONANT | _null_ | ഩ Nnna |
|`U+0D2A` | Letter | CONSONANT | _null_ | പ Pa |
|`U+0D2B` | Letter | CONSONANT | _null_ | ഫ Pha |
|`U+0D2C` | Letter | CONSONANT | _null_ | ബ Ba |
|`U+0D2D` | Letter | CONSONANT | _null_ | ഭ Bha |
|`U+0D2E` | Letter | CONSONANT | _null_ | മ Ma |
|`U+0D2F` | Letter | CONSONANT | _null_ | യ Ya |
| | | | |
|`U+0D30` | Letter | CONSONANT | _null_ | ര Ra |
|`U+0D31` | Letter | CONSONANT | _null_ | റ Rra |
|`U+0D32` | Letter | CONSONANT | _null_ | ല La |
|`U+0D33` | Letter | CONSONANT | _null_ | ള Lla |
|`U+0D34` | Letter | CONSONANT | _null_ | ഴ Llla |
|`U+0D35` | Letter | CONSONANT | _null_ | വ Va |
|`U+0D36` | Letter | CONSONANT | _null_ | ശ Sha |
|`U+0D37` | Letter | CONSONANT | _null_ | ഷ Ssa |
|`U+0D38` | Letter | CONSONANT | _null_ | സ Sa |
|`U+0D39` | Letter | CONSONANT | _null_ | ഹ Ha |
|`U+0D3A` | Letter | CONSONANT | _null_ | ഺ Ttta |
|`U+0D3B` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ഻ Vertical Bar Virama |
|`U+0D3C` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ഼ Circular Virama |
|`U+0D3D` | Letter | AVAGRAHA | _null_ | ഽ Avagraha |
|`U+0D3E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ാ Sign Aa |
|`U+0D3F` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ി Sign I |
| | | | |
|`U+0D40` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ീ Sign Ii |
|`U+0D41` | Mark [Mn] | VOWEL_DEPENDENT | RIGHT_POSITION | ു Sign U |
|`U+0D42` | Mark [Mn] | VOWEL_DEPENDENT | RIGHT_POSITION | ൂ Sign Uu |
|`U+0D43` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൃ Sign Vocalic R |
|`U+0D44` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൄ Sign Vocalic Rr |
|`U+0D45` | _unassigned_ | | | |
|`U+0D46` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | െ Sign E |
|`U+0D47` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | േ Sign Ee |
|`U+0D48` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ൈ Sign Ai |
|`U+0D49` | _unassigned_ | | | |
|`U+0D4A` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ൊ Sign O |
|`U+0D4B` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ോ Sign Oo |
|`U+0D4C` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ൌ Sign Au |
|`U+0D4D` | Mark [Mn] | VIRAMA | TOP_POSITION | ് Virama |
|`U+0D4E` | Letter | CONSONANT_PRE_REPHA| _null_ | ൎ Dot Reph |
|`U+0D4F` | Symbol | SYMBOL | _null_ | ൏ Para |
| | | | |
|`U+0D50` | _unassigned_ | | | |
|`U+0D51` | _unassigned_ | | | |
|`U+0D52` | _unassigned_ | | | |
|`U+0D53` | _unassigned_ | | | |
|`U+0D54` | Letter | CONSONANT_DEAD | _null_ | ൔ Chillu M |
|`U+0D55` | Letter | CONSONANT_DEAD | _null_ | ൕ Chillu Y |
|`U+0D56` | Letter | CONSONANT_DEAD | _null_ | ൖ Chillu Lll |
|`U+0D57` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ൗ Au Length Mark |
|`U+0D58` | Number | NUMBER | _null_ | ൘ Fraction 1/160 |
|`U+0D59` | Number | NUMBER | _null_ | ൙ Fraction 1/40 |
|`U+0D5A` | Number | NUMBER | _null_ | ൚ Fraction 3/80 |
|`U+0D5B` | Number | NUMBER | _null_ | ൛ Fraction 1/20 |
|`U+0D5C` | Number | NUMBER | _null_ | ൜ Fraction 1/10 |
|`U+0D5D` | Number | NUMBER | _null_ | ൝ Fraction 3/20 |
|`U+0D5E` | Number | NUMBER | _null_ | ൞ Fraction 1/5 |
|`U+0D5F` | Letter | VOWEL_INDEPENDENT | _null_ | ൟ Archaic Ii |
| | | | |
|`U+0D60` | Letter | VOWEL_INDEPENDENT | _null_ | ൠ Vocalic Rr |
|`U+0D61` | Letter | VOWEL_INDEPENDENT | _null_ | ൡ Vocalic Ll |
|`U+0D62` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൢ Sign Vocalic L |
|`U+0D63` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ൣ Sign Vocalic Ll |
|`U+0D64` | _unassigned_ | | | |
|`U+0D65` | _unassigned_ | | | |
|`U+0D66` | Number | NUMBER | _null_ | ൦ Digit Zero |
|`U+0D67` | Number | NUMBER | _null_ | ൧ Digit One |
|`U+0D68` | Number | NUMBER | _null_ | ൨ Digit Two |
|`U+0D69` | Number | NUMBER | _null_ | ൩ Digit Three |
|`U+0D6A` | Number | NUMBER | _null_ | ൪ Digit Four |
|`U+0D6B` | Number | NUMBER | _null_ | ൫ Digit Five |
|`U+0D6C` | Number | NUMBER | _null_ | ൬ Digit Six |
|`U+0D6D` | Number | NUMBER | _null_ | ൭ Digit Seven |
|`U+0D6E` | Number | NUMBER | _null_ | ൮ Digit Eight |
|`U+0D6F` | Number | NUMBER | _null_ | ൯ Digit Nine |
| | | | |
|`U+0D70` | Number | NUMBER | | ൰ Number Ten |
|`U+0D71` | Number | NUMBER | | ൱ Number One Hundred |
|`U+0D72` | Number | NUMBER | | ൲ Number One Thousand |
|`U+0D73` | Number | NUMBER | | ൳ Fraction 1/4 |
|`U+0D74` | Number | NUMBER | | ൴ Fraction 1/2 |
|`U+0D75` | Number | NUMBER | | ൵ Fraction 3/4 |
|`U+0D76` | Number | NUMBER | | ൶ Fraction 1/16 |
|`U+0D77` | Number | NUMBER | | ൷ Fraction 1/8 |
|`U+0D78` | Number | NUMBER | _null_ | ൸ Fraction 3/16 |
|`U+0D79` | Symbol | SYMBOL | _null_ | ൹ Date Mark |
|`U+0D7A` | Letter | CONSONANT_DEAD | _null_ | ൺ Chillu Nn |
|`U+0D7B` | Letter | CONSONANT_DEAD | _null_ | ൻ Chillu N |
|`U+0D7C` | Letter | CONSONANT_DEAD | _null_ | ർ Chillu Rr |
|`U+0D7D` | Letter | CONSONANT_DEAD | _null_ | ൽ Chillu L |
|`U+0D7E` | Letter | CONSONANT_DEAD | _null_ | ൾ Chillu Ll |
|`U+0D7F` | Letter | CONSONANT_DEAD | _null_ | ൿ Chillu K |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Malayalam script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Malayalam text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Malayalam text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Malayalam text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-mongolian.md
================================================
# Mongolian character tables #
This document lists the per-character shaping information needed to
[shape Mongolian text](../opentype-shaping-mongolian.md).
**Contents**
- [Mongolian character table](#mongolian-character-table)
- [Mongolian Supplement character table](#mongolian-supplement-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Mongolian character table ##
Mongolian glyphs should be classified as in the following
table. Codepoints in the Mongolian block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
The _Joining type_ column indicates whether each codepoint is defined
as joining with adjacent characters on the left side, right side, left
and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints
designated TRANSPARENT in the _Joining type_ column do not join with
adjacent characters and, in addition, do not affect the joining
behavior of surrounding characters. Non-spacing marks are of type
TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent
characters to join.
The _Joining group_ column lists the fundamental letter that the
listed codepoint behaves like for joining purposes.
Assigned codepoints with a _null_ in the _Joining group_
column evoke no special behavior from the shaping engine during the
join-computation stage.
The _Mark class_ column indicates the Canonical Combining Class
for the codepoint. Marks are assigned non-zero combining classes so
that sequences of adjacent marks can be reordered as required by the
orthography.
For Mongolian, a subset of marks in the 220 and 230 classes are also
designated _Modifier Combining Marks_ (MCM). These are denoted with
_220_MCM_ and _230_MCM_ in the _Mark class_ column. The MCM marks are
treated differently during the mark-reordering stage.
:::{table} Mongolian character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+1800` | Punctuation | NON_JOINING | _null_ | _0_ | ᠀ Mongolian Birga |
|`U+1801` | Punctuation | NON_JOINING | _null_ | _0_ | ᠁ Mongolian Ellipsis |
|`U+1802` | Punctuation | NON_JOINING | _null_ | _0_ | ᠂ Mongolian Comma |
|`U+1803` | Punctuation | NON_JOINING | _null_ | _0_ | ᠃ Mongolian Full Stop |
|`U+1804` | Punctuation | NON_JOINING | _null_ | _0_ | ᠄ Mongolian Colon |
|`U+1805` | Punctuation | NON_JOINING | _null_ | _0_ | ᠅ Mongolian Four Dots |
|`U+1806` | Punctuation [Pd] | NON_JOINING | _null_ | _0_ | ᠆ Todo Soft Hyphen |
|`U+1807` | Punctuation | DUAL | _null_ | _0_ | ᠇ Sibe Syllable Boundary Mark |
|`U+1808` | Punctuation | NON_JOINING | _null_ | _0_ | ᠈ Manchu Comma |
|`U+1809` | Punctuation | NON_JOINING | _null_ | _0_ | ᠉ Manchu Full Stop |
|`U+180A` | Punctuation | JOIN_CAUSING | _null_ | _0_ | ᠊ Mongolian Nirugu |
|`U+180B` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠋ Free Variation Selector One |
|`U+180C` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠌ Free Variation Selector Two |
|`U+180D` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠍ Free Variation Selector Three |
|`U+180E` | Formatting | NON_JOINING | _null_ | _0_ | Mongolian Vowel Separator |
|`U+180F` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠏ Free Variation Selector Four |
| | | | | |
|`U+1810` | Number | NON_JOINING | _null_ | _0_ | ᠐ Digit Zero |
|`U+1811` | Number | NON_JOINING | _null_ | _0_ | ᠑ Digit One |
|`U+1812` | Number | NON_JOINING | _null_ | _0_ | ᠒ Digit Two |
|`U+1813` | Number | NON_JOINING | _null_ | _0_ | ᠓ Digit Three |
|`U+1814` | Number | NON_JOINING | _null_ | _0_ | ᠔ Digit Four |
|`U+1815` | Number | NON_JOINING | _null_ | _0_ | ᠕ Digit Five |
|`U+1816` | Number | NON_JOINING | _null_ | _0_ | ᠖ Digit Six |
|`U+1817` | Number | NON_JOINING | _null_ | _0_ | ᠗ Digit Seven |
|`U+1818` | Number | NON_JOINING | _null_ | _0_ | ᠘ Digit Eight |
|`U+1819` | Number | NON_JOINING | _null_ | _0_ | ᠙ Digit Nine |
|`U+181A` | _unassigned_ | | | | |
|`U+181B` | _unassigned_ | | | | |
|`U+181C` | _unassigned_ | | | | |
|`U+181D` | _unassigned_ | | | | |
|`U+181E` | _unassigned_ | | | | |
|`U+181F` | _unassigned_ | | | | |
| | | | | |
|`U+1820` | Letter | DUAL | _null_ | _0_ | ᠠ A |
|`U+1821` | Letter | DUAL | _null_ | _0_ | ᠡ E |
|`U+1822` | Letter | DUAL | _null_ | _0_ | ᠢ I |
|`U+1823` | Letter | DUAL | _null_ | _0_ | ᠣ O |
|`U+1824` | Letter | DUAL | _null_ | _0_ | ᠤ U |
|`U+1825` | Letter | DUAL | _null_ | _0_ | ᠥ Oe |
|`U+1827` | Letter | DUAL | _null_ | _0_ | ᠦ Ue |
|`U+1827` | Letter | DUAL | _null_ | _0_ | ᠧ Ee |
|`U+1828` | Letter | DUAL | _null_ | _0_ | ᠨ Na |
|`U+1829` | Letter | DUAL | _null_ | _0_ | ᠩ Ang |
|`U+182A` | Letter | DUAL | _null_ | _0_ | ᠪ Ba |
|`U+182B` | Letter | DUAL | _null_ | _0_ | ᠫ Pa |
|`U+182C` | Letter | DUAL | _null_ | _0_ | ᠬ Qa |
|`U+182D` | Letter | DUAL | _null_ | _0_ | ᠭ Ga |
|`U+182E` | Letter | DUAL | _null_ | _0_ | ᠮ Ma |
|`U+182F` | Letter | DUAL | _null_ | _0_ | ᠯ La |
| | | | | |
|`U+1830` | Letter | DUAL | _null_ | _0_ | ᠰ Sa |
|`U+1831` | Letter | DUAL | _null_ | _0_ | ᠱ Sha |
|`U+1832` | Letter | DUAL | _null_ | _0_ | ᠲ Ta |
|`U+1833` | Letter | DUAL | _null_ | _0_ | ᠳ Da |
|`U+1834` | Letter | DUAL | _null_ | _0_ | ᠴ Cha |
|`U+1835` | Letter | DUAL | _null_ | _0_ | ᠵ Ja |
|`U+1836` | Letter | DUAL | _null_ | _0_ | ᠶ Ya |
|`U+1837` | Letter | DUAL | _null_ | _0_ | ᠷ Ra |
|`U+1838` | Letter | DUAL | _null_ | _0_ | ᠸ Wa |
|`U+1839` | Letter | DUAL | _null_ | _0_ | ᠹ Fa |
|`U+183A` | Letter | DUAL | _null_ | _0_ | ᠺ Ka |
|`U+183B` | Letter | DUAL | _null_ | _0_ | ᠻ Kha |
|`U+183C` | Letter | DUAL | _null_ | _0_ | ᠼ Tsa |
|`U+183D` | Letter | DUAL | _null_ | _0_ | ᠽ Za |
|`U+183E` | Letter | DUAL | _null_ | _0_ | ᠾ Haa |
|`U+183F` | Letter | DUAL | _null_ | _0_ | ᠿ Zra |
| | | | | |
|`U+1840` | Letter | DUAL | _null_ | _0_ | ᡀ Lha |
|`U+1841` | Letter | DUAL | _null_ | _0_ | ᡁ Zhi |
|`U+1842` | Letter | DUAL | _null_ | _0_ | ᡂ Chi |
|`U+1843` | Letter | DUAL | _null_ | _0_ | ᡃ Todo Long Vowel Sign |
|`U+1844` | Letter | DUAL | _null_ | _0_ | ᡄ Todo E |
|`U+1845` | Letter | DUAL | _null_ | _0_ | ᡅ Todo I |
|`U+1846` | Letter | DUAL | _null_ | _0_ | ᡆ Todo O |
|`U+1847` | Letter | DUAL | _null_ | _0_ | ᡇ Todo U |
|`U+1848` | Letter | DUAL | _null_ | _0_ | ᡈ Todo Oe |
|`U+1849` | Letter | DUAL | _null_ | _0_ | ᡉ Todo Ue |
|`U+184A` | Letter | DUAL | _null_ | _0_ | ᡊ Todo Ang |
|`U+184B` | Letter | DUAL | _null_ | _0_ | ᡋ Todo Ba |
|`U+184C` | Letter | DUAL | _null_ | _0_ | ᡌ Todo Pa |
|`U+184D` | Letter | DUAL | _null_ | _0_ | ᡍ Todo Qa |
|`U+184E` | Letter | DUAL | _null_ | _0_ | ᡎ Todo Ga |
|`U+184F` | Letter | DUAL | _null_ | _0_ | ᡏ Todo Ma |
| | | | | |
|`U+1850` | Letter | DUAL | _null_ | _0_ | ᡐ Todo Ta |
|`U+1851` | Letter | DUAL | _null_ | _0_ | ᡑ Todo Da |
|`U+1852` | Letter | DUAL | _null_ | _0_ | ᡒ Todo Cha |
|`U+1853` | Letter | DUAL | _null_ | _0_ | ᡓ Todo Ja |
|`U+1854` | Letter | DUAL | _null_ | _0_ | ᡔ Todo Tsa |
|`U+1855` | Letter | DUAL | _null_ | _0_ | ᡕ Todo Ya |
|`U+1856` | Letter | DUAL | _null_ | _0_ | ᡖ Todo Wa |
|`U+1857` | Letter | DUAL | _null_ | _0_ | ᡗ Todo Ka |
|`U+1858` | Letter | DUAL | _null_ | _0_ | ᡘ Todo Gaa |
|`U+1859` | Letter | DUAL | _null_ | _0_ | ᡙ Todo Haa |
|`U+185A` | Letter | DUAL | _null_ | _0_ | ᡚ Todo Jia |
|`U+185B` | Letter | DUAL | _null_ | _0_ | ᡛ Todo Nia |
|`U+185C` | Letter | DUAL | _null_ | _0_ | ᡜ Todo Dza |
|`U+185D` | Letter | DUAL | _null_ | _0_ | ᡝ Sibe E |
|`U+185E` | Letter | DUAL | _null_ | _0_ | ᡞ Sibe I |
|`U+185F` | Letter | DUAL | _null_ | _0_ | ᡟ Sibe Iy |
| | | | | |
|`U+1860` | Letter | DUAL | _null_ | _0_ | ᡠ Sibe Ue |
|`U+1861` | Letter | DUAL | _null_ | _0_ | ᡡ Sibe U |
|`U+1862` | Letter | DUAL | _null_ | _0_ | ᡢ Sibe Ang |
|`U+1863` | Letter | DUAL | _null_ | _0_ | ᡣ Sibe Ka |
|`U+1864` | Letter | DUAL | _null_ | _0_ | ᡤ Sibe Ga |
|`U+1865` | Letter | DUAL | _null_ | _0_ | ᡥ Sibe Ha |
|`U+1866` | Letter | DUAL | _null_ | _0_ | ᡦ Sibe Pa |
|`U+1867` | Letter | DUAL | _null_ | _0_ | ᡧ Sibe Sha |
|`U+1868` | Letter | DUAL | _null_ | _0_ | ᡨ Sibe Ta |
|`U+1869` | Letter | DUAL | _null_ | _0_ | ᡩ Sibe Da |
|`U+186A` | Letter | DUAL | _null_ | _0_ | ᡪ Sibe Ja |
|`U+186B` | Letter | DUAL | _null_ | _0_ | ᡫ Sibe Fa |
|`U+186C` | Letter | DUAL | _null_ | _0_ | ᡬ Sibe Gaa |
|`U+186D` | Letter | DUAL | _null_ | _0_ | ᡭ Sibe Haa |
|`U+186E` | Letter | DUAL | _null_ | _0_ | ᡮ Sibe Tsa |
|`U+186F` | Letter | DUAL | _null_ | _0_ | ᡯ Sibe Za |
| | | | | |
|`U+1870` | Letter | DUAL | _null_ | _0_ | ᡰ Sibe Raa |
|`U+1871` | Letter | DUAL | _null_ | _0_ | ᡱ Sibe Cha |
|`U+1872` | Letter | DUAL | _null_ | _0_ | ᡲ Sibe Zha |
|`U+1873` | Letter | DUAL | _null_ | _0_ | ᡳ Manchu I |
|`U+1874` | Letter | DUAL | _null_ | _0_ | ᡴ Manchu Ka |
|`U+1875` | Letter | DUAL | _null_ | _0_ | ᡵ Manchu Ra |
|`U+1876` | Letter | DUAL | _null_ | _0_ | ᡶ Manchu Fa |
|`U+1877` | Letter | DUAL | _null_ | _0_ | ᡷ Manchu Zha |
|`U+1878` | Letter | DUAL | _null_ | _0_ | ᡸ Cha With Two Dots |
|`U+1879` | _unassigned_ | | | | |
|`U+187A` | _unassigned_ | | | | |
|`U+187B` | _unassigned_ | | | | |
|`U+187C` | _unassigned_ | | | | |
|`U+187D` | _unassigned_ | | | | |
|`U+187E` | _unassigned_ | | | | |
|`U+187F` | _unassigned_ | | | | |
| | | | | |
|`U+1880` | Letter | NON_JOINING | _null_ | _0_ | ᢀ Ali Gali Anusvara One |
|`U+1881` | Letter | NON_JOINING | _null_ | _0_ | ᢁ Ali Gali Visarga One |
|`U+1882` | Letter | NON_JOINING | _null_ | _0_ | ᢂ Ali Gali Damaru |
|`U+1883` | Letter | NON_JOINING | _null_ | _0_ | ᢃ Ali Gali Ubadama |
|`U+1884` | Letter | NON_JOINING | _null_ | _0_ | ᢄ Ali Gali Inverted Ubadama |
|`U+1885` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᢅ Ali Gali Baluda |
|`U+1886` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᢆ Ali Gali Three Baluda |
|`U+1887` | Letter | DUAL | _null_ | _0_ | ᢇ Ali Gali A |
|`U+1888` | Letter | DUAL | _null_ | _0_ | ᢈ Ali Gali I |
|`U+1889` | Letter | DUAL | _null_ | _0_ | ᢉ Ali Gali Ka |
|`U+188A` | Letter | DUAL | _null_ | _0_ | ᢊ Ali Gali Nga |
|`U+188B` | Letter | DUAL | _null_ | _0_ | ᢋ Ali Gali Ca |
|`U+188C` | Letter | DUAL | _null_ | _0_ | ᢌ Ali Gali Tta |
|`U+188D` | Letter | DUAL | _null_ | _0_ | ᢍ Ali Gali Ttha |
|`U+188E` | Letter | DUAL | _null_ | _0_ | ᢎ Ali Gali Dda |
|`U+188F` | Letter | DUAL | _null_ | _0_ | ᢏ Ali Gali Nna |
| | | | | |
|`U+1890` | Letter | DUAL | _null_ | _0_ | ᢐ Ali Gali Ta |
|`U+1891` | Letter | DUAL | _null_ | _0_ | ᢑ Ali Gali Da |
|`U+1892` | Letter | DUAL | _null_ | _0_ | ᢒ Ali Gali Pa |
|`U+1893` | Letter | DUAL | _null_ | _0_ | ᢓ Ali Gali Pha |
|`U+1894` | Letter | DUAL | _null_ | _0_ | ᢔ Ali Gali Ssa |
|`U+1895` | Letter | DUAL | _null_ | _0_ | ᢕ Ali Gali Zha |
|`U+1896` | Letter | DUAL | _null_ | _0_ | ᢖ Ali Gali Za |
|`U+1897` | Letter | DUAL | _null_ | _0_ | ᢗ Ali Gali Ah |
|`U+1898` | Letter | DUAL | _null_ | _0_ | ᢘ Todo Ali Gali Ta |
|`U+1899` | Letter | DUAL | _null_ | _0_ | ᢙ Todo Ali Gali Zha |
|`U+189A` | Letter | DUAL | _null_ | _0_ | ᢚ Manchu Ali Gali Gha |
|`U+189B` | Letter | DUAL | _null_ | _0_ | ᢛ Manchu Ali Gali Nga |
|`U+189C` | Letter | DUAL | _null_ | _0_ | ᢜ Manchu Ali Gali Ca |
|`U+189D` | Letter | DUAL | _null_ | _0_ | ᢝ Manchu Ali Gali Jha |
|`U+189E` | Letter | DUAL | _null_ | _0_ | ᢞ Manchu Ali Gali Tta |
|`U+189F` | Letter | DUAL | _null_ | _0_ | ᢟ Manchu Ali Gali Ddha |
| | | | | |
|`U+18A0` | Letter | DUAL | _null_ | _0_ | ᢠ Manchu Ali Gali Ta |
|`U+18A1` | Letter | DUAL | _null_ | _0_ | ᢡ Manchu Ali Gali Dha |
|`U+18A2` | Letter | DUAL | _null_ | _0_ | ᢢ Manchu Ali Gali Ssa |
|`U+18A3` | Letter | DUAL | _null_ | _0_ | ᢣ Manchu Ali Gali Cya |
|`U+18A4` | Letter | DUAL | _null_ | _0_ | ᢤ Manchu Ali Gali Zha |
|`U+18A5` | Letter | DUAL | _null_ | _0_ | ᢥ Manchu Ali Gali Za |
|`U+18A6` | Letter | DUAL | _null_ | _0_ | ᢦ Ali Gali Half U |
|`U+18A7` | Letter | DUAL | _null_ | _0_ | ᢧ Ali Gali Half Ya |
|`U+18A8` | Letter | DUAL | _null_ | _0_ | ᢨ Manchu Ali Gali Bha |
|`U+18A9` | Mark [Mn] | TRANSPARENT | _null_ | 228 | ᢩ Ali Gali Dagalga |
|`U+18AA` | Letter | DUAL | _null_ | _0_ | ᢪ Manchu Ali Gali Lha |
|`U+18AB` | _unassigned_ | | | | |
|`U+18AC` | _unassigned_ | | | | |
|`U+18AD` | _unassigned_ | | | | |
|`U+18AE` | _unassigned_ | | | | |
|`U+18AF` | _unassigned_ | | | | |
:::
## Mongolian Supplement character table ##
The Mongolian Supplement block includes variants of the _birga_ mark
used to denote the beginning of a text.
:::{table} Mongolian Supplement character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+11660` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙠 Birga with Ornament |
|`U+11661` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙡 Rotated Birga |
|`U+11662` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙢 Double Birga with Ornament |
|`U+11663` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙣 Triple Birga with Ornament |
|`U+11664` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙤 Birga with Double Ornament |
|`U+11665` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙥 Rotated Birga with Ornament |
|`U+11666` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙦 Rotated Birga with Double Ornament |
|`U+11667` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙧 Inverted Birga |
|`U+11668` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙨 Inverted Birga with Double Ornament |
|`U+11669` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙩 Swirl Birga |
|`U+1166A` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙪 Swirl Birga with Ornament |
|`U+1166B` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙫 Swirl Birga with Double Ornament |
|`U+1166C` | Punctuation | NON_JOINING | _null_ | _0_ | 𑙬 Turned Swirl Birga with Double Ornament|
|`U+1166D` | _unassigned_ | | | | |
|`U+1166E` | _unassigned_ | | | | |
|`U+1166F` | _unassigned_ | | | | |
| | | | | |
|`U+11670` | _unassigned_ | | | | |
|`U+11671` | _unassigned_ | | | | |
|`U+11672` | _unassigned_ | | | | |
|`U+11673` | _unassigned_ | | | | |
|`U+11674` | _unassigned_ | | | | |
|`U+11675` | _unassigned_ | | | | |
|`U+11676` | _unassigned_ | | | | |
|`U+11677` | _unassigned_ | | | | |
|`U+11678` | _unassigned_ | | | | |
|`U+11679` | _unassigned_ | | | | |
|`U+1167A` | _unassigned_ | | | | |
|`U+1167B` | _unassigned_ | | | | |
|`U+1167C` | _unassigned_ | | | | |
|`U+1167D` | _unassigned_ | | | | |
|`U+1167E` | _unassigned_ | | | | |
|`U+1167F` | _unassigned_ | | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Mongolian text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
combining mark in isolation. Real-world text syllables may also use
other characters, such as hyphens or dashes, in a similar placeholder
fashion; shaping engines should cope with this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|
|`U+00A0` | Separator | NON_JOINING | _null_ | _0_ | No-break space |
|`U+200C` | Other | NON_JOINING | _null_ | _0_ | Zero-width non-joiner |
|`U+200D` | Other | JOIN_CAUSING | _null_ | _0_ | Zero-width joiner |
|`U+2010` | Punctuation | NON_JOINING | _null_ | _0_ | ‐ Hyphen |
|`U+2011` | Punctuation | NON_JOINING | _null_ | _0_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | NON_JOINING | _null_ | _0_ | ‒ Figure dash |
|`U+2013` | Punctuation | NON_JOINING | _null_ | _0_ | – En dash |
|`U+2014` | Punctuation | NON_JOINING | _null_ | _0_ | — Em dash |
|`U+202F` | Separator | NON_JOINING | _null_ | _0_ | Narrow No-Break Space |
|`U+25CC` | Symbol | NON_JOINING | _null_ | _0_ | ◌ Dotted circle |
| | | | | | |
:::
The zero-width joiner (ZWJ) is primarily used to force the usage of the
cursive connecting form of a letter even when the context of the
adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such
as for displaying it in a table of forms), the sequence "_Letter_,ZWJ"
would be used. To show the medial form of a letter in isolation, the
sequence "ZWJ,_Letter_,ZWJ" would be used.
The no-break space is primarily used to display those codepoints that
are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
The narrow no-break space is used in Mongolian to insert a small gap
between a word and its suffix.
================================================
FILE: character-tables/character-tables-myanmar.md
================================================
# Myanmar character tables #
This document lists the per-character shaping information needed to
[shape Myanmar text](../opentype-shaping-myanmar.md).
**Contents**
- [Myanmar character table](#myanmar-character-table)
- [Myanmar Extended character tables](#myanmar-extended-character-tables)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Myanmar character table ##
Myanmar glyphs should be classified as in the following
table. Codepoints in the Myanmar block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Myanmar character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1000` | Letter | CONSONANT | _null_ | က Ka |
|`U+1001` | Letter | CONSONANT | _null_ | ခ Kha |
|`U+1002` | Letter | CONSONANT | _null_ | ဂ Ga |
|`U+1003` | Letter | CONSONANT | _null_ | ဃ Gha |
|`U+1004` | Letter | CONSONANT | _null_ | င Nga |
|`U+1005` | Letter | CONSONANT | _null_ | စ Ca |
|`U+1006` | Letter | CONSONANT | _null_ | ဆ Cha |
|`U+1007` | Letter | CONSONANT | _null_ | ဇ Ja |
|`U+1008` | Letter | CONSONANT | _null_ | ဈ Jha |
|`U+1009` | Letter | CONSONANT | _null_ | ဉ Nya |
|`U+100A` | Letter | CONSONANT | _null_ | ည Nnya |
|`U+100B` | Letter | CONSONANT | _null_ | ဋ Tta |
|`U+100C` | Letter | CONSONANT | _null_ | ဌ Ttha |
|`U+100D` | Letter | CONSONANT | _null_ | ဍ Dda |
|`U+100E` | Letter | CONSONANT | _null_ | ဎ DDha |
|`U+100F` | Letter | CONSONANT | _null_ | ဏ Nna |
| | | | |
|`U+1010` | Letter | CONSONANT | _null_ | တ Ta |
|`U+1011` | Letter | CONSONANT | _null_ | ထ Tha |
|`U+1012` | Letter | CONSONANT | _null_ | ဒ Da |
|`U+1013` | Letter | CONSONANT | _null_ | ဓ Dha |
|`U+1014` | Letter | CONSONANT | _null_ | န Na |
|`U+1015` | Letter | CONSONANT | _null_ | ပ Pa |
|`U+1016` | Letter | CONSONANT | _null_ | ဖ Pha |
|`U+1017` | Letter | CONSONANT | _null_ | ဗ Ba |
|`U+1018` | Letter | CONSONANT | _null_ | ဘ Bha |
|`U+1019` | Letter | CONSONANT | _null_ | မ Ma |
|`U+101A` | Letter | CONSONANT | _null_ | ယ Ya |
|`U+101B` | Letter | CONSONANT | _null_ | ရ Ra |
|`U+101C` | Letter | CONSONANT | _null_ | လ La |
|`U+101D` | Letter | CONSONANT | _null_ | ဝ Wa |
|`U+101E` | Letter | CONSONANT | _null_ | သ Sa |
|`U+101F` | Letter | CONSONANT | _null_ | ဟ Ha |
| | | | |
|`U+1020` | Letter | CONSONANT | _null_ | ဠ Lla |
|`U+1021` | Letter | VOWEL_INDEPENDENT | _null_ | အ A |
|`U+1022` | Letter | VOWEL_INDEPENDENT | _null_ | ဢ Shan A |
|`U+1023` | Letter | VOWEL_INDEPENDENT | _null_ | ဣ I |
|`U+1024` | Letter | VOWEL_INDEPENDENT | _null_ | ဤ Ii |
|`U+1025` | Letter | VOWEL_INDEPENDENT | _null_ | ဥ U |
|`U+1026` | Letter | VOWEL_INDEPENDENT | _null_ | ဦ Uu |
|`U+1027` | Letter | VOWEL_INDEPENDENT | _null_ | ဧ E |
|`U+1028` | Letter | VOWEL_INDEPENDENT | _null_ | ဨ Mon E |
|`U+1029` | Letter | VOWEL_INDEPENDENT | _null_ | ဩ O |
|`U+102A` | Letter | VOWEL_INDEPENDENT | _null_ | ဪ Au |
|`U+102B` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ါ Sign Tall Aa |
|`U+102C` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ာ Sign Aa |
|`U+102D` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ိ Sign I |
|`U+102E` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ီ Sign Ii |
|`U+102F` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ု Sign U |
| | | | |
|`U+1030` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ူ Sign Uu |
|`U+1031` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ေ Sign E |
|`U+1032` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဲ Sign Ai |
|`U+1033` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဳ Sign Mon Ii |
|`U+1034` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဴ Sign Mon O |
|`U+1035` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ဵ Sign E Above |
|`U+1036` | Mark [Mn] | BINDU | TOP_POSITION | ံ Anusvara |
|`U+1037` | Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ့ Dot Below |
|`U+1038` | Mark [Mc] | VISARGA | RIGHT_POSITION | း Visarga |
|`U+1039` | Mark [Mn] | INVISIBLE_STACKER | _null_ | ္ Virama |
|`U+103A` | Mark [Mn] | PURE_KILLER | TOP_POSITION | ် Asat |
|`U+103B` | Mark [Mc] | CONSONANT_MEDIAL | RIGHT_POSITION | ျ Sign Medial Ya |
|`U+103C` | Mark [Mc] | CONSONANT_MEDIAL | TOP_LEFT_AND_BOTTOM_POSITION | ြ Sign Medial Ra |
|`U+103D` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ွ Sign Medial Wa |
|`U+103E` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ှ Sign Medial Ha |
|`U+103F` | Letter | CONSONANT | _null_ | ဿ Great Sa |
| | | | |
|`U+1040` | Number | NUMBER | _null_ | ၀ Digit Zero |
|`U+1041` | Number | NUMBER | _null_ | ၁ Digit One |
|`U+1042` | Number | NUMBER | _null_ | ၂ Digit Two |
|`U+1043` | Number | NUMBER | _null_ | ၃ Digit Three |
|`U+1044` | Number | NUMBER | _null_ | ၄ Digit Four |
|`U+1045` | Number | NUMBER | _null_ | ၅ Digit Five |
|`U+1046` | Number | NUMBER | _null_ | ၆ Digit Six |
|`U+1047` | Number | NUMBER | _null_ | ၇ Digit Seven |
|`U+1048` | Number | NUMBER | _null_ | ၈ Digit Eight |
|`U+1049` | Number | NUMBER | _null_ | ၉ Digit Nine |
|`U+104A` | Punctuation | _null_ | _null_ | ၊ Little Section |
|`U+104B` | Punctuation | _null_ | _null_ | ။ Section |
|`U+104C` | Punctuation | _null_ | _null_ | ၌ Locative |
|`U+104D` | Punctuation | _null_ | _null_ | ၍ Completed |
|`U+104E` | Punctuation | CONSONANT_PLACEHOLDER| _null_ | ၎ Aforementioned |
|`U+104F` | Punctuation | _null_ | _null_ | ၏ Genitive |
| | | | |
|`U+1050` | Letter | CONSONANT | _null_ | ၐ Sha |
|`U+1051` | Letter | CONSONANT | _null_ | ၑ Ssa |
|`U+1052` | Letter | VOWEL_INDEPENDENT | _null_ | ၒ Vocalic R |
|`U+1053` | Letter | VOWEL_INDEPENDENT | _null_ | ၓ Vocalic Rr |
|`U+1054` | Letter | VOWEL_INDEPENDENT | _null_ | ၔ Vocalic L |
|`U+1055` | Letter | VOWEL_INDEPENDENT | _null_ | ၕ Vocalic Ll |
|`U+1056` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၖ Sign Vocalic R |
|`U+1057` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၗ Sign Vocalic Rr |
|`U+1058` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ၘ Sign Vocalic L |
|`U+1059` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ၙ Sign Vocalic Ll |
|`U+105A` | Letter | CONSONANT | _null_ | ၚ Mon Nga |
|`U+105B` | Letter | CONSONANT | _null_ | ၛ Mon Jha |
|`U+105C` | Letter | CONSONANT | _null_ | ၜ Mon Bba |
|`U+105D` | Letter | CONSONANT | _null_ | ၝ Mon Bbe |
|`U+105E` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၞ Sign Mon Medial Na |
|`U+105F` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၟ Sign Mon Medial Ma |
| | | | |
|`U+1060` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ၠ Sign Mon Medial La |
|`U+1061` | Letter | CONSONANT | _null_ | ၡ Sgaw Karen Sha |
|`U+1062` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၢ Sign Sgaw Karen Eu |
|`U+1063` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၣ Tone Sgaw Karen Hathi|
|`U+1064` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၤ Tone Sgaw Karen Ke Pho|
|`U+1065` | Letter | CONSONANT | _null_ | ၥ Western Pwo Karen Tha|
|`U+1066` | Letter | CONSONANT | _null_ | ၦ Western Pwo Karen Pwa|
|`U+1067` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၧ Sign Western Pwo Karen Eu|
|`U+1068` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ၨ Sign Western Pwo Karen Ue|
|`U+1069` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၩ Sign Western Pwo Karen Tone 1|
|`U+106A` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၪ Sign Western Pwo Karen Tone 2|
|`U+106B` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၫ Sign Western Pwo Karen Tone 3|
|`U+106C` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၬ Sign Western Pwo Karen Tone 4|
|`U+106D` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ၭ Sign Western Pwo Karen Tone 5|
|`U+106E` | Letter | CONSONANT | _null_ | ၮ Eastern Pwo Karen Nna|
|`U+106F` | Letter | CONSONANT | _null_ | ၯ Eastern Pwo Karen Ywa|
| | | | |
|`U+1070` | Letter | CONSONANT | _null_ | ၰ Eastern Pwo Karen Ghwa|
|`U+1071` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၱ Sign Geba Karen I |
|`U+1072` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၲ Sign Kayah Oe |
|`U+1073` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၳ Sign Kayah U |
|`U+1074` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ၴ Sign Kayah Ee |
|`U+1075` | Letter | CONSONANT | _null_ | ၵ Shan Ka |
|`U+1076` | Letter | CONSONANT | _null_ | ၶ Shan Kha |
|`U+1077` | Letter | CONSONANT | _null_ | ၷ Shan Ga |
|`U+1078` | Letter | CONSONANT | _null_ | ၸ Shan Ca |
|`U+1079` | Letter | CONSONANT | _null_ | ၹ Shan Za |
|`U+107A` | Letter | CONSONANT | _null_ | ၺ Shan Nya |
|`U+107B` | Letter | CONSONANT | _null_ | ၻ Shan Da |
|`U+107C` | Letter | CONSONANT | _null_ | ၼ Shan Na |
|`U+107D` | Letter | CONSONANT | _null_ | ၽ Shan Pha |
|`U+107E` | Letter | CONSONANT | _null_ | ၾ Shan Fa |
|`U+107F` | Letter | CONSONANT | _null_ | ၿ Shan Ba |
| | | | |
|`U+1080` | Letter | CONSONANT | _null_ | ႀ Shan Tha |
|`U+1081` | Letter | CONSONANT | _null_ | ႁ Shan Ha |
|`U+1082` | Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | ႂ Sign Shan Medial Wa |
|`U+1083` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ႃ Sign Shan Aa |
|`U+1084` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ႄ Sign Shan E |
|`U+1085` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႅ Sign Shan E Above |
|`U+1086` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႆ Sign Shan Final Y |
|`U+1087` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႇ Sign Shan Tone 2 |
|`U+1088` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႈ Sign Shan Tone 3 |
|`U+1089` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႉ Sign Shan Tone 5 |
|`U+108A` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႊ Sign Shan Tone 6 |
|`U+108B` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႋ Sign Shan Council Tone 2|
|`U+108C` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႌ Sign Shan Council Tone 3|
|`U+108D` | Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ႍ Sign Shan Council Emphatic Tone|
|`U+108E` | Letter | CONSONANT | _null_ | ႎ Rumai Palaung Fa |
|`U+108F` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႏ Sign Rumai Palaung Tone 5|
| | | | |
|`U+1090` | Number | NUMBER | _null_ | ႐ Shan Digit Zero |
|`U+1091` | Number | NUMBER | _null_ | ႑ Shan Digit One |
|`U+1092` | Number | NUMBER | _null_ | ႒ Shan Digit Two |
|`U+1093` | Number | NUMBER | _null_ | ႓ Shan Digit Three |
|`U+1094` | Number | NUMBER | _null_ | ႔ Shan Digit Four |
|`U+1095` | Number | NUMBER | _null_ | ႕ Shan Digit Five |
|`U+1096` | Number | NUMBER | _null_ | ႖ Shan Digit Six |
|`U+1097` | Number | NUMBER | _null_ | ႗ Shan Digit Seven |
|`U+1098` | Number | NUMBER | _null_ | ႘ Shan Digit Eight |
|`U+1099` | Number | NUMBER | _null_ | ႙ Shan Digit Nine |
|`U+109A` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႚ Sign Khamti Tone 1 |
|`U+109B` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ႛ Sign Khamti Tone 3 |
|`U+109C` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ႜ Sign Aiton A |
|`U+109D` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ႝ Sign Aiton Ai |
|`U+109E` | Symbol | SYMBOL | _null_ | ႞ Shan One |
|`U+109F` | Symbol | SYMBOL | _null_ | ႟ Shan Exclamation |
:::
## Myanmar Extended character tables ##
### Myanmar Extended A character table ###
:::{table} Myanmar Extended-A character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+AA60` | Letter | CONSONANT | _null_ | ꩠ Khamti Ga |
|`U+AA61` | Letter | CONSONANT | _null_ | ꩡ Khamti Ca |
|`U+AA62` | Letter | CONSONANT | _null_ | ꩢ Khamti Cha |
|`U+AA63` | Letter | CONSONANT | _null_ | ꩣ Khamti Ja |
|`U+AA64` | Letter | CONSONANT | _null_ | ꩤ Khamti Jha |
|`U+AA65` | Letter | CONSONANT | _null_ | ꩥ Khamti Nya |
|`U+AA66` | Letter | CONSONANT | _null_ | ꩦ Khamti Tta |
|`U+AA67` | Letter | CONSONANT | _null_ | ꩧ Khamti Ttha |
|`U+AA68` | Letter | CONSONANT | _null_ | ꩨ Khamti Dda |
|`U+AA69` | Letter | CONSONANT | _null_ | ꩩ Khamti Ddha |
|`U+AA6A` | Letter | CONSONANT | _null_ | ꩪ Khamti Dha |
|`U+AA6B` | Letter | CONSONANT | _null_ | ꩫ Khamti Na |
|`U+AA6C` | Letter | CONSONANT | _null_ | ꩬ Khamti Sa |
|`U+AA6D` | Letter | CONSONANT | _null_ | ꩭ Khamti Ha |
|`U+AA6E` | Letter | CONSONANT | _null_ | ꩮ Khamti Hha |
|`U+AA6F` | Letter | CONSONANT | _null_ | ꩯ Khamti Fa |
| | | | |
|`U+AA70` | Letter | _null_ | _null_ | ꩰ Khamti Reduplication|
|`U+AA71` | Letter | CONSONANT | _null_ | ꩱ Khamti Xa |
|`U+AA72` | Letter | CONSONANT | _null_ | ꩲ Khamti Za |
|`U+AA73` | Letter | CONSONANT | _null_ | ꩳ Khamti Ra |
|`U+AA74` | Letter | CONSONANT_PLACEHOLDER| _null_ | ꩴ Khamti Oay |
|`U+AA75` | Letter | CONSONANT_PLACEHOLDER| _null_ | ꩵ Khamti Qn |
|`U+AA76` | Letter | CONSONANT_PLACEHOLDER| _null_ | ꩶ Khamti Hm |
|`U+AA77` | Symbol | SYMBOL | _null_ | ꩷ Khamti Aiton Exclamation|
|`U+AA78` | Symbol | SYMBOL | _null_ | ꩸ Khamti Aiton One |
|`U+AA79` | Symbol | SYMBOL | _null_ | ꩹ Khamti Aiton Two |
|`U+AA7A` | Letter | CONSONANT | _null_ | ꩺ Khamti Aiton Ra |
|`U+AA7B` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ꩻ Sign Pao Karen Tone |
|`U+AA7C` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ꩼ Sign Tai Laing Tone 2|
|`U+AA7D` | Mark [Mc] | TONE_MARKER | RIGHT_POSITION | ꩽ Sign Tai Laing Tone 5|
|`U+AA7E` | Letter | CONSONANT | _null_ | ꩾ Shwe Palaung Cha |
|`U+AA7F` | Letter | CONSONANT | _null_ | ꩿ Shwe Palaung Sha |
:::
### Myanmar Extended B character table ###
:::{table} Myanmar Extended-B character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+A9E0` | Letter | CONSONANT | _null_ | ꧠ Shan Gha |
|`U+A9E1` | Letter | CONSONANT | _null_ | ꧡ Shan Cha |
|`U+A9E2` | Letter | CONSONANT | _null_ | ꧢ Shan Jha |
|`U+A9E3` | Letter | CONSONANT | _null_ | ꧣ Shan Nna |
|`U+A9E4` | Letter | CONSONANT | _null_ | ꧤ Shan Bha |
|`U+A9E5` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ꧥ Sign Shan Saw |
|`U+A9E6` | Letter | _null_ | _null_ | ꧦ Shan Reduplication |
|`U+A9E7` | Letter | CONSONANT | _null_ | ꧧ Tai Laing Nya |
|`U+A9E8` | Letter | CONSONANT | _null_ | ꧨ Tai Laing Fa |
|`U+A9E9` | Letter | CONSONANT | _null_ | ꧩ Tai Laing Ga |
|`U+A9EA` | Letter | CONSONANT | _null_ | ꧪ Tai Laing Gha |
|`U+A9EB` | Letter | CONSONANT | _null_ | ꧫ Tai Laing Ja |
|`U+A9EC` | Letter | CONSONANT | _null_ | ꧬ Tai Laing Jha |
|`U+A9ED` | Letter | CONSONANT | _null_ | ꧭ Tai Laing Dda |
|`U+A9EE` | Letter | CONSONANT | _null_ | ꧮ Tai Laing Ddha |
|`U+A9EF` | Letter | CONSONANT | _null_ | ꧯ Tai Laing Nna |
| | | | |
|`U+A9F0` | Number | NUMBER | _null_ | ꧰ Tai Laing Digit Zero|
|`U+A9F1` | Number | NUMBER | _null_ | ꧱ Tai Laing Digit One |
|`U+A9F2` | Number | NUMBER | _null_ | ꧲ Tai Laing Digit Two |
|`U+A9F3` | Number | NUMBER | _null_ | ꧳ Tai Laing Digit Three|
|`U+A9F4` | Number | NUMBER | _null_ | ꧴ Tai Laing Digit Four|
|`U+A9F5` | Number | NUMBER | _null_ | ꧵ Tai Laing Digit Five|
|`U+A9F6` | Number | NUMBER | _null_ | ꧶ Tai Laing Digit Six |
|`U+A9F7` | Number | NUMBER | _null_ | ꧷ Tai Laing Digit Seven|
|`U+A9F8` | Number | NUMBER | _null_ | ꧸ Tai Laing Digit Eight|
|`U+A9F9` | Number | NUMBER | _null_ | ꧹ Tai Laing Digit Nine|
|`U+A9FA` | Letter | CONSONANT | _null_ | ꧺ Tai Laing Lla |
|`U+A9FB` | Letter | CONSONANT | _null_ | ꧻ Tai Laing Da |
|`U+A9FC` | Letter | CONSONANT | _null_ | ꧼ Tai Laing Dha |
|`U+A9FD` | Letter | CONSONANT | _null_ | ꧽ Tai Laing Ba |
|`U+A9FE` | Letter | CONSONANT | _null_ | ꧾ Tai Laing Bha |
|`U+A9FF` | _unassigned_ | | | |
:::
### Myanmar Extended C character table ###
:::{table} Myanmar Extended-C character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+116D0` | Number | NUMBER | _null_ | Pao Digit Zero |
|`U+116D1` | Number | NUMBER | _null_ | Pao Digit One |
|`U+116D2` | Number | NUMBER | _null_ | Pao Digit Two |
|`U+116D3` | Number | NUMBER | _null_ | Pao Digit Three |
|`U+116D4` | Number | NUMBER | _null_ | Pao Digit Four |
|`U+116D5` | Number | NUMBER | _null_ | Pao Digit Five |
|`U+116D6` | Number | NUMBER | _null_ | Pao Digit Six |
|`U+116D7` | Number | NUMBER | _null_ | Pao Digit Seven |
|`U+116D8` | Number | NUMBER | _null_ | Pao Digit Eight |
|`U+116D9` | Number | NUMBER | _null_ | Pao Digit Nine |
|`U+116DA` | Number | NUMBER | _null_ | Pao Digit Zero |
|`U+116DB` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit One|
|`U+116DC` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Two|
|`U+116DD` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Three|
|`U+116DE` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Four|
|`U+116DF` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Five|
| | | | |
|`U+116E0` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Six|
|`U+116E1` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Seven|
|`U+116E2` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Eight|
|`U+116E3` | Number | NUMBER | _null_ | Eastern Pwo Karen Digit Nine|
|`U+116E4` | _unassigned_ | | | |
|`U+116E5` | _unassigned_ | | | |
|`U+116E6` | _unassigned_ | | | |
|`U+116E7` | _unassigned_ | | | |
|`U+116E8` | _unassigned_ | | | |
|`U+116E9` | _unassigned_ | | | |
|`U+116EA` | _unassigned_ | | | |
|`U+116EB` | _unassigned_ | | | |
|`U+116EC` | _unassigned_ | | | |
|`U+116ED` | _unassigned_ | | | |
|`U+116EE` | _unassigned_ | | | |
|`U+116EF` | _unassigned_ | | | |
| | | | |
|`U+116F0` | _unassigned_ | | | |
|`U+116F1` | _unassigned_ | | | |
|`U+116F2` | _unassigned_ | | | |
|`U+116F3` | _unassigned_ | | | |
|`U+116F4` | _unassigned_ | | | |
|`U+116F5` | _unassigned_ | | | |
|`U+116F6` | _unassigned_ | | | |
|`U+116F7` | _unassigned_ | | | |
|`U+116F8` | _unassigned_ | | | |
|`U+116F9` | _unassigned_ | | | |
|`U+116FA` | _unassigned_ | | | |
|`U+116FB` | _unassigned_ | | | |
|`U+116FC` | _unassigned_ | | | |
|`U+116FD` | _unassigned_ | | | |
|`U+116FE` | _unassigned_ | | | |
|`U+116FF` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Myanmar script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Myanmar text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct
from a "_Consonant_,Halant,_Consonant_" sequence. The sequence
"_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of a
conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence
"_Consonant_,Halant,ZWNJ,_Consonant_" should produce the first
consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-nko.md
================================================
# N'Ko character tables #
This document lists the per-character shaping information needed to
[shape N'Ko text](../opentype-shaping-nko.md).
**Contents**
- [NKo character table](#nko-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## NKo character table ##
N'Ko glyphs should be classified as in the following
table. Codepoints in the NKo block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
The _Joining type_ column indicates whether each codepoint is defined
as joining with adjacent characters on the left side, right side, left
and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints
designated TRANSPARENT in the _Joining type_ column do not join with
adjacent characters and, in addition, do not affect the joining
behavior of surrounding characters. Non-spacing marks are of type
TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent
characters to join.
The _Joining group_ column lists the fundamental letter that the
listed codepoint behaves like for joining purposes.
Assigned codepoints with a _null_ in the _Joining group_
column evoke no special behavior from the shaping engine during the
join-computation stage.
> Note: No codepoints in the NKo block are assigned a non-null _Joining group_.
The _Mark class_ column indicates the Canonical Combining Class
for the codepoint. Marks are assigned non-zero combining classes so
that sequences of adjacent marks can be reordered as required by the
orthography.
:::{table} N'Ko character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+07C0` | Number | NON_JOINING | _null_ | _0_ | ߀ Digit Zero |
|`U+07C1` | Number | NON_JOINING | _null_ | _0_ | ߁ Digit One |
|`U+07C2` | Number | NON_JOINING | _null_ | _0_ | ߂ Digit Two |
|`U+07C3` | Number | NON_JOINING | _null_ | _0_ | ߃ Digit Three |
|`U+07C4` | Number | NON_JOINING | _null_ | _0_ | ߄ Digit Four |
|`U+07C5` | Number | NON_JOINING | _null_ | _0_ | ߅ Digit Five |
|`U+07C6` | Number | NON_JOINING | _null_ | _0_ | ߆ Digit Six |
|`U+07C7` | Number | NON_JOINING | _null_ | _0_ | ߇ Digit Seven |
|`U+07C8` | Number | NON_JOINING | _null_ | _0_ | ߈ Digit Eight |
|`U+07C9` | Number | NON_JOINING | _null_ | _0_ | ߉ Digit Nine |
|`U+07CA` | Letter | DUAL | _null_ | _0_ | ߊ A |
|`U+07CB` | Letter | DUAL | _null_ | _0_ | ߋ Ee |
|`U+07CC` | Letter | DUAL | _null_ | _0_ | ߌ I |
|`U+07CD` | Letter | DUAL | _null_ | _0_ | ߍ E |
|`U+07CE` | Letter | DUAL | _null_ | _0_ | ߎ U |
|`U+07CF` | Letter | DUAL | _null_ | _0_ | ߏ Oo |
| | | | | |
|`U+07D0` | Letter | DUAL | _null_ | _0_ | ߐ O |
|`U+07D1` | Letter | DUAL | _null_ | _0_ | ߑ Dagbasinna |
|`U+07D2` | Letter | DUAL | _null_ | _0_ | ߒ N |
|`U+07D3` | Letter | DUAL | _null_ | _0_ | ߓ Ba |
|`U+07D4` | Letter | DUAL | _null_ | _0_ | ߔ Pa |
|`U+07D5` | Letter | DUAL | _null_ | _0_ | ߕ Ta |
|`U+07D6` | Letter | DUAL | _null_ | _0_ | ߖ Ja |
|`U+07D7` | Letter | DUAL | _null_ | _0_ | ߗ Cha |
|`U+07D8` | Letter | DUAL | _null_ | _0_ | ߘ Da |
|`U+07D9` | Letter | DUAL | _null_ | _0_ | ߙ Ra |
|`U+07DA` | Letter | DUAL | _null_ | _0_ | ߚ Rra |
|`U+07DB` | Letter | DUAL | _null_ | _0_ | ߛ Sa |
|`U+07DC` | Letter | DUAL | _null_ | _0_ | ߜ Gba |
|`U+07DD` | Letter | DUAL | _null_ | _0_ | ߝ Fa |
|`U+07DE` | Letter | DUAL | _null_ | _0_ | ߞ Ka |
|`U+07DF` | Letter | DUAL | _null_ | _0_ | ߟ La |
| | | | | |
|`U+07E0` | Letter | DUAL | _null_ | _0_ | ߠ Na Woloso |
|`U+07E1` | Letter | DUAL | _null_ | _0_ | ߡ Ma |
|`U+07E2` | Letter | DUAL | _null_ | _0_ | ߢ Nya |
|`U+07E3` | Letter | DUAL | _null_ | _0_ | ߣ Na |
|`U+07E4` | Letter | DUAL | _null_ | _0_ | ߤ Ha |
|`U+07E5` | Letter | DUAL | _null_ | _0_ | ߥ Wa |
|`U+07E6` | Letter | DUAL | _null_ | _0_ | ߦ Ya |
|`U+07E7` | Letter | DUAL | _null_ | _0_ | ߧ Nya Woloso |
|`U+07E8` | Letter | DUAL | _null_ | _0_ | ߨ Jona Ja |
|`U+07E9` | Letter | DUAL | _null_ | _0_ | ߩ Jona Cha |
|`U+07EA` | Letter | DUAL | _null_ | _0_ | ߪ Jona Ra |
|`U+07EB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߫ Combining Short High Tone |
|`U+07EC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߬ Combining Short Low Tone |
|`U+07ED` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߭ Combining Short Rising Tone |
|`U+07EE` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߮ Combining Long Descending Tone |
|`U+07EF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߯ Combining Long High Tone |
| | | | | |
|`U+07F0` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߰ Combining Long Low Tone |
|`U+07F1` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߱ Combining Long Rising Tone |
|`U+07F2` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ߲ Combining Nasalization Mark |
|`U+07F3` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ߳ Combining Double Dot Above |
|`U+07F4` | Letter modifier | NON_JOINING | _null_ | _0_ | ߴ High Tone Apostrophe |
|`U+07F5` | Letter modifier | NON_JOINING | _null_ | _0_ | ߵ Low Tone Apostrophe |
|`U+07F6` | Symbol | NON_JOINING | _null_ | _0_ | ߶ Symbol Oo Dennen |
|`U+07F7` | Symbol | NON_JOINING | _null_ | _0_ | ߷ Symbol Gbakurunen |
|`U+07F8` | Punctuation | NON_JOINING | _null_ | _0_ | ߸ Comma |
|`U+07F9` | Punctuation | NON_JOINING | _null_ | _0_ | ߹ Exclamation Mark |
|`U+07FA` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ߺ Lajanyalan |
|`U+07FB` | _unassigned_ | | | | |
|`U+07FC` | _unassigned_ | | | | |
|`U+07FD` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ߽ Dantalayan |
|`U+07FE` | Symbol | NON_JOINING | _null_ | _0_ | ߾ Dorome Sign |
|`U+07FF` | Symbol | NON_JOINING | _null_ | _0_ | ߿ Taman Sign |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of N'Ko text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
combining mark in isolation. Real-world text syllables may also use
other characters, such as hyphens or dashes, in a similar placeholder
fashion; shaping engines should cope with this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|
|`U+00A0` | Separator | NON_JOINING | _null_ | _0_ | No-break space |
|`U+034F` | Other | NON_JOINING | _null_ | _0_ | ͏ Combining grapheme joiner |
|`U+200C` | Other | NON_JOINING | _null_ | _0_ | Zero-width non-joiner |
|`U+200D` | Other | JOIN_CAUSING | _null_ | _0_ | Zero-width joiner |
|`U+200E` | Other | NON_JOINING | _null_ | _0_ | Left-to-Right marker |
|`U+200F` | Other | NON_JOINING | _null_ | _0_ | Right-to-Left marker |
|`U+2010` | Punctuation | NON_JOINING | _null_ | _0_ | ‐ Hyphen |
|`U+2011` | Punctuation | NON_JOINING | _null_ | _0_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | NON_JOINING | _null_ | _0_ | ‒ Figure dash |
|`U+2013` | Punctuation | NON_JOINING | _null_ | _0_ | – En dash |
|`U+2014` | Punctuation | NON_JOINING | _null_ | _0_ | — Em dash |
|`U+25CC` | Symbol | NON_JOINING | _null_ | _0_ | ◌ Dotted circle |
:::
The combining grapheme joiner (CGJ) is primarily used to alter the
order in which adjacent marks are positioned during the
mark-reordering stage, in order to adhere to the needs of a
non-default language orthography.
The zero-width joiner (ZWJ) is primarily used to force the usage of the
cursive connecting form of a letter even when the context of the
adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such
as for displaying it in a table of forms), the sequence "_Letter_,ZWJ"
would be used. To show the medial form of a letter in isolation, the
sequence "ZWJ,_Letter_,ZWJ" would be used.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by
the Unicode bidirectionality algorithm (BiDi) to indicate the points
in a text run at which the writing direction changes.
The no-break space is primarily used to display those codepoints that
are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
================================================
FILE: character-tables/character-tables-oriya.md
================================================
# Oriya character tables #
This document lists the per-character shaping information needed to
[shape Oriya text](../opentype-shaping-oriya.md).
**Contents**
- [Oriya character table](#oriya-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Oriya character table ##
Oriya glyphs should be classified as in the following
table. Codepoints in the Oriya block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Oriya character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0B00` | _unassigned_ | | | |
|`U+0B01` | Mark [Mn] | BINDU | TOP_POSITION | ଁ Candrabindu |
|`U+0B02` | Mark [Mc] | BINDU | RIGHT_POSITION | ଂ Anusvara |
|`U+0B03` | Mark [Mc] | VISARGA | RIGHT_POSITION | ଃ Visarga |
|`U+0B04` | _unassigned_ | | | |
|`U+0B05` | Letter | VOWEL_INDEPENDENT | _null_ | ଅ A |
|`U+0B06` | Letter | VOWEL_INDEPENDENT | _null_ | ଆ Aa |
|`U+0B07` | Letter | VOWEL_INDEPENDENT | _null_ | ଇ I |
|`U+0B08` | Letter | VOWEL_INDEPENDENT | _null_ | ଈ Ii |
|`U+0B09` | Letter | VOWEL_INDEPENDENT | _null_ | ଉ U |
|`U+0B0A` | Letter | VOWEL_INDEPENDENT | _null_ | ଊ Uu |
|`U+0B0B` | Letter | VOWEL_INDEPENDENT | _null_ | ଋ Vocalic R |
|`U+0B0C` | Letter | VOWEL_INDEPENDENT | _null_ | ଌ Vocalic L |
|`U+0B0D` | _unassigned_ | | | |
|`U+0B0E` | _unassigned_ | | | |
|`U+0B0F` | Letter | VOWEL_INDEPENDENT | _null_ | ଏ E |
| | | | |
|`U+0B10` | Letter | VOWEL_INDEPENDENT | _null_ | ଐ Ai |
|`U+0B11` | _unassigned_ | | | |
|`U+0B12` | _unassigned_ | | | |
|`U+0B13` | Letter | VOWEL_INDEPENDENT | _null_ | ଓ O |
|`U+0B14` | Letter | VOWEL_INDEPENDENT | _null_ | ଔ Au |
|`U+0B15` | Letter | CONSONANT | _null_ | କ Ka |
|`U+0B16` | Letter | CONSONANT | _null_ | ଖ Kha |
|`U+0B17` | Letter | CONSONANT | _null_ | ଗ Ga |
|`U+0B18` | Letter | CONSONANT | _null_ | ଘ Gha |
|`U+0B19` | Letter | CONSONANT | _null_ | ଙ Nga |
|`U+0B1A` | Letter | CONSONANT | _null_ | ଚ Ca |
|`U+0B1B` | Letter | CONSONANT | _null_ | ଛ Cha |
|`U+0B1C` | Letter | CONSONANT | _null_ | ଜ Ja |
|`U+0B1D` | Letter | CONSONANT | _null_ | ଝ Jha |
|`U+0B1E` | Letter | CONSONANT | _null_ | ଞ Nya |
|`U+0B1F` | Letter | CONSONANT | _null_ | ଟ Tta |
| | | | |
|`U+0B20` | Letter | CONSONANT | _null_ | ଠ Ttha |
|`U+0B21` | Letter | CONSONANT | _null_ | ଡ Dda |
|`U+0B22` | Letter | CONSONANT | _null_ | ଢ Ddha |
|`U+0B23` | Letter | CONSONANT | _null_ | ଣ Nna |
|`U+0B24` | Letter | CONSONANT | _null_ | ତ Ta |
|`U+0B25` | Letter | CONSONANT | _null_ | ଥ Tha |
|`U+0B26` | Letter | CONSONANT | _null_ | ଦ Da |
|`U+0B27` | Letter | CONSONANT | _null_ | ଧ Dha |
|`U+0B28` | Letter | CONSONANT | _null_ | ନ Na |
|`U+0B29` | _unassigned_ | | | |
|`U+0B2A` | Letter | CONSONANT | _null_ | ପ Pa |
|`U+0B2B` | Letter | CONSONANT | _null_ | ଫ Pha |
|`U+0B2C` | Letter | CONSONANT | _null_ | ବ Ba |
|`U+0B2D` | Letter | CONSONANT | _null_ | ଭ Bha |
|`U+0B2E` | Letter | CONSONANT | _null_ | ମ Ma |
|`U+0B2F` | Letter | CONSONANT | _null_ | ଯ Ya |
| | | | |
|`U+0B30` | Letter | CONSONANT | _null_ | ର Ra |
|`U+0B31` | _unassigned_ | | | |
|`U+0B32` | Letter | CONSONANT | _null_ | ଲ La |
|`U+0B33` | Letter | CONSONANT | _null_ | ଳ Lla |
|`U+0B34` | _unassigned_ | | | |
|`U+0B35` | Letter | CONSONANT | _null_ | ଵ Va |
|`U+0B36` | Letter | CONSONANT | _null_ | ଶ Sha |
|`U+0B37` | Letter | CONSONANT | _null_ | ଷ Ssa |
|`U+0B38` | Letter | CONSONANT | _null_ | ସ Sa |
|`U+0B39` | Letter | CONSONANT | _null_ | ହ Ha |
|`U+0B3A` | _unassigned_ | | | |
|`U+0B3B` | _unassigned_ | | | |
|`U+0B3C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ଼ Nukta |
|`U+0B3D` | Letter | AVAGRAHA | _null_ | ଽ Avagraha |
|`U+0B3E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ା Sign Aa |
|`U+0B3F` | Mark [Mc] | VOWEL_DEPENDENT | TOP_POSITION | ି Sign I |
| | | | |
|`U+0B40` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ୀ Sign Ii |
|`U+0B41` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୁ Sign U |
|`U+0B42` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୂ Sign Uu |
|`U+0B43` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୃ Sign Vocalic R |
|`U+0B44` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୄ Sign Vocalic Rr |
|`U+0B45` | _unassigned_ | | | |
|`U+0B46` | _unassigned_ | | | |
|`U+0B47` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | େ Sign E |
|`U+0B48` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_LEFT_POSITION | ୈ Sign Ai |
|`U+0B49` | _unassigned_ | | | |
|`U+0B4A` | _unassigned_ | | | |
|`U+0B4B` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ୋ Sign O |
|`U+0B4C` | Mark [Mc] | VOWEL_DEPENDENT | TOP_LEFT_AND_RIGHT_POSITION| ୌ Sign Au |
|`U+0B4D` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ୍ Virama |
|`U+0B4E` | _unassigned_ | | | |
|`U+0B4F` | _unassigned_ | | | |
| | | | |
|`U+0B50` | _unassigned_ | | | |
|`U+0B51` | _unassigned_ | | | |
|`U+0B52` | _unassigned_ | | | |
|`U+0B53` | _unassigned_ | | | |
|`U+0B54` | _unassigned_ | | | |
|`U+0B55` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ୕ Sign Overline |
|`U+0B56` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ୖ Ai Length Mark |
|`U+0B57` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_RIGHT_POSITION | ୗ Au Length Mark |
|`U+0B58` | _unassigned_ | | | |
|`U+0B59` | _unassigned_ | | | |
|`U+0B5A` | _unassigned_ | | | |
|`U+0B5B` | _unassigned_ | | | |
|`U+0B5C` | Letter | CONSONANT | _null_ | ଡ଼ Rra |
|`U+0B5D` | Letter | CONSONANT | _null_ | ଢ଼ Rha |
|`U+0B5E` | _unassigned_ | | | |
|`U+0B5F` | Letter | CONSONANT | _null_ | ୟ Yya |
| | | | |
|`U+0B60` | Letter | VOWEL_INDEPENDENT | _null_ | ୠ Vocalic Rr |
|`U+0B61` | Letter | VOWEL_INDEPENDENT | _null_ | ୡ Vocalic Ll |
|`U+0B62` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୢ Sign Vocalic L |
|`U+0B63` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ୣ Sign Vocalic Ll |
|`U+0B64` | _unassigned_ | | | |
|`U+0B65` | _unassigned_ | | | |
|`U+0B66` | Number | NUMBER | _null_ | ୦ Digit Zero |
|`U+0B67` | Number | NUMBER | _null_ | ୧ Digit One |
|`U+0B68` | Number | NUMBER | _null_ | ୨ Digit Two |
|`U+0B69` | Number | NUMBER | _null_ | ୩ Digit Three |
|`U+0B6A` | Number | NUMBER | _null_ | ୪ Digit Four |
|`U+0B6B` | Number | NUMBER | _null_ | ୫ Digit Five |
|`U+0B6C` | Number | NUMBER | _null_ | ୬ Digit Six |
|`U+0B6D` | Number | NUMBER | _null_ | ୭ Digit Seven |
|`U+0B6E` | Number | NUMBER | _null_ | ୮ Digit Eight |
|`U+0B6F` | Number | NUMBER | _null_ | ୯ Digit Nine |
| | | | |
|`U+0B70` | Symbol | SYMBOL | _null_ | ୰ Isshar |
|`U+0B71` | Letter | CONSONANT | _null_ | ୱ Wa |
|`U+0B72` | Number | NUMBER | _null_ | ୲ Fraction 1/4 |
|`U+0B73` | Number | NUMBER | _null_ | ୳ Fraction 1/2 |
|`U+0B74` | Number | NUMBER | _null_ | ୴ Fraction 3/4 |
|`U+0B75` | Number | NUMBER | _null_ | ୵ Fraction 1/16 |
|`U+0B76` | Number | NUMBER | _null_ | ୶ Fraction 1/8 |
|`U+0B77` | Number | NUMBER | _null_ | ୷ Fraction 3/16 |
|`U+0B78` | _unassigned_ | | | |
|`U+0B79` | _unassigned_ | | | |
|`U+0B7A` | _unassigned_ | | | |
|`U+0B7B` | _unassigned_ | | | |
|`U+0B7C` | _unassigned_ | | | |
|`U+0B7D` | _unassigned_ | | | |
|`U+0B7E` | _unassigned_ | | | |
|`U+0B7F` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Oriya script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Oriya text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Oriya text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Oriya text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-sinhala.md
================================================
# Sinhala character tables #
This document lists the per-character shaping information needed to
[shape Sinhala text](../opentype-shaping-sinhala.md).
**Contents**
- [Sinhala character table](#sinhala-character-table)
- [Sinhala Archaic Numbers character table](#sinhala-archaic-numbers-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Sinhala character table ##
Sinhala glyphs should be classified as in the following
table. Codepoints in the Sinhala block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Sinhala character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0D80` | _unassigned_ | | | |
|`U+0D81` | Mark [Mn] _ | BINDU | TOP_POSITION | ඁ Candrabindu |
|`U+0D82` | Mark [Mc] | BINDU | RIGHT_POSITION | ං Anusvara |
|`U+0D83` | Mark [Mc] | VISARGA | RIGHT_POSITION | ඃ Visarga |
|`U+0D84` | _unassigned_ | | | |
|`U+0D85` | Letter | VOWEL_INDEPENDENT | _null_ | අ A |
|`U+0D86` | Letter | VOWEL_INDEPENDENT | _null_ | ආ Aa |
|`U+0D87` | Letter | VOWEL_INDEPENDENT | _null_ | ඇ Ae |
|`U+0D88` | Letter | VOWEL_INDEPENDENT | _null_ | ඈ Aae |
|`U+0D89` | Letter | VOWEL_INDEPENDENT | _null_ | ඉ I |
|`U+0D8A` | Letter | VOWEL_INDEPENDENT | _null_ | ඊ Ii |
|`U+0D8B` | Letter | VOWEL_INDEPENDENT | _null_ | උ U |
|`U+0D8C` | Letter | VOWEL_INDEPENDENT | _null_ | ඌ Uu |
|`U+0D8D` | Letter | VOWEL_INDEPENDENT | _null_ | ඍ Vocalic R |
|`U+0D8E` | Letter | VOWEL_INDEPENDENT | _null_ | ඎ Vocalic Rr |
|`U+0D8F` | Letter | VOWEL_INDEPENDENT | _null_ | ඏ Vocalic L |
| | | | |
|`U+0D90` | Letter | VOWEL_INDEPENDENT | _null_ | ඐ Vocalic Ll |
|`U+0D91` | Letter | VOWEL_INDEPENDENT | _null_ | එ E |
|`U+0D92` | Letter | VOWEL_INDEPENDENT | _null_ | ඒ Ee |
|`U+0D93` | Letter | VOWEL_INDEPENDENT | _null_ | ඓ Ai |
|`U+0D94` | Letter | VOWEL_INDEPENDENT | _null_ | ඔ O |
|`U+0D95` | Letter | VOWEL_INDEPENDENT | _null_ | ඕ Oo |
|`U+0D96` | Letter | VOWEL_INDEPENDENT | _null_ | ඖ Au |
|`U+0D97` | _unassigned_ | | | |
|`U+0D98` | _unassigned_ | | | |
|`U+0D99` | _unassigned_ | | | |
|`U+0D9A` | Letter | CONSONANT | _null_ | ක Ka |
|`U+0D9B` | Letter | CONSONANT | _null_ | ඛ Kha |
|`U+0D9C` | Letter | CONSONANT | _null_ | ග Ga |
|`U+0D9D` | Letter | CONSONANT | _null_ | ඝ Gha |
|`U+0D9E` | Letter | CONSONANT | _null_ | ඞ Nga |
|`U+0D9F` | Letter | CONSONANT | _null_ | ඟ Nnga |
| | | | |
|`U+0DA0` | Letter | CONSONANT | _null_ | ච Ca |
|`U+0DA1` | Letter | CONSONANT | _null_ | ඡ Cha |
|`U+0DA2` | Letter | CONSONANT | _null_ | ජ Ja |
|`U+0DA3` | Letter | CONSONANT | _null_ | ඣ Jha |
|`U+0DA4` | Letter | CONSONANT | _null_ | ඤ Nya |
|`U+0DA5` | Letter | CONSONANT | _null_ | ඥ Jnya |
|`U+0DA6` | Letter | CONSONANT | _null_ | ඦ Nyja |
|`U+0DA7` | Letter | CONSONANT | _null_ | ට Tta |
|`U+0DA8` | Letter | CONSONANT | _null_ | ඨ Ttha |
|`U+0DA9` | Letter | CONSONANT | _null_ | ඩ Dda |
|`U+0DAA` | Letter | CONSONANT | _null_ | ඪ Ddha |
|`U+0DAB` | Letter | CONSONANT | _null_ | ණ Nna |
|`U+0DAC` | Letter | CONSONANT | _null_ | ඬ Nndda |
|`U+0DAD` | Letter | CONSONANT | _null_ | ත Ta |
|`U+0DAE` | Letter | CONSONANT | _null_ | ථ Tha |
|`U+0DAF` | Letter | CONSONANT | _null_ | ද Da |
| | | | |
|`U+0DB0` | Letter | CONSONANT | _null_ | ධ Dha |
|`U+0DB1` | Letter | CONSONANT | _null_ | න Na |
|`U+0DB2` | _unassigned_ | | | |
|`U+0DB3` | Letter | CONSONANT | _null_ | ඳ Nda |
|`U+0DB4` | Letter | CONSONANT | _null_ | ප Pa |
|`U+0DB5` | Letter | CONSONANT | _null_ | ඵ Pha |
|`U+0DB6` | Letter | CONSONANT | _null_ | බ Ba |
|`U+0DB7` | Letter | CONSONANT | _null_ | භ Bha |
|`U+0DB8` | Letter | CONSONANT | _null_ | ම Ma |
|`U+0DB9` | Letter | CONSONANT | _null_ | ඹ Mba |
|`U+0DBA` | Letter | CONSONANT | _null_ | ය Ya |
|`U+0DBB` | Letter | CONSONANT | _null_ | ර Ra |
|`U+0DBC` | _unassigned_ | | | |
|`U+0DBD` | Letter | CONSONANT | _null_ | ල La |
|`U+0DBE` | _unassigned_ | | | |
|`U+0DBF` | _unassigned_ | | | |
| | | | |
|`U+0DC0` | Letter | CONSONANT | _null_ | ව Va |
|`U+0DC1` | Letter | CONSONANT | _null_ | ශ Sha |
|`U+0DC2` | Letter | CONSONANT | _null_ | ෂ Ssa |
|`U+0DC3` | Letter | CONSONANT | _null_ | ස Sa |
|`U+0DC4` | Letter | CONSONANT | _null_ | හ Ha |
|`U+0DC5` | Letter | CONSONANT | _null_ | ළ Lla |
|`U+0DC6` | Letter | CONSONANT | _null_ | ෆ Fa |
|`U+0DC7` | _unassigned_ | | | |
|`U+0DC8` | _unassigned_ | | | |
|`U+0DC9` | _unassigned_ | | | |
|`U+0DCA` | Mark [MN] | VIRAMA | TOP_POSITION | ් Virama |
|`U+0DCB` | _unassigned_ | | | |
|`U+0DCC` | _unassigned_ | | | |
|`U+0DCD` | _unassigned_ | | | |
|`U+0DCE` | _unassigned_ | | | |
|`U+0DCF` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ා Sign Aa |
| | | | |
|`U+0DD0` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ැ Sign Ae |
|`U+0DD1` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ෑ Sign Aae |
|`U+0DD2` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ි Sign I |
|`U+0DD3` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ී Sign Ii |
|`U+0DD4` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ු Sign U |
|`U+0DD5` | _unassigned_ | | | |
|`U+0DD6` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ූ Sign Uu |
|`U+0DD7` | _unassigned_ | | | |
|`U+0DD8` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ෘ Sign Vocalic R |
|`U+0DD9` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ෙ Sign E |
|`U+0DDA` | Mark [Mc] | VOWEL_DEPENDENT | TOP_AND_LEFT_POSITION | ේ Sign Ee |
|`U+0DDB` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ෛ Sign Ai |
|`U+0DDC` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ො Sign O |
|`U+0DDD` | Mark [Mc] | VOWEL_DEPENDENT | TOP_LEFT_AND_RIGHT_POSITION| ෝ Sign Oo |
|`U+0DDE` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ෞ Sign Au |
|`U+0DDF` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ෟ Sign Vocalic L |
| | | | |
|`U+0DE0` | _unassigned_ | | | |
|`U+0DE1` | _unassigned_ | | | |
|`U+0DE2` | _unassigned_ | | | |
|`U+0DE3` | _unassigned_ | | | |
|`U+0DE4` | _unassigned_ | | | |
|`U+0DE5` | _unassigned_ | | | |
|`U+0DE6` | Number | NUMBER | _null_ | ෦ Digit Zero |
|`U+0DE7` | Number | NUMBER | _null_ | ෧ Digit One |
|`U+0DE8` | Number | NUMBER | _null_ | ෨ Digit Two |
|`U+0DE9` | Number | NUMBER | _null_ | ෩ Digit Three |
|`U+0DEA` | Number | NUMBER | _null_ | ෪ Digit Four |
|`U+0DEB` | Number | NUMBER | _null_ | ෫ Digit Five |
|`U+0DEC` | Number | NUMBER | _null_ | ෬ Digit Six |
|`U+0DED` | Number | NUMBER | _null_ | ෭ Digit Seven |
|`U+0DEE` | Number | NUMBER | _null_ | ෮ Digit Eight |
|`U+0DEF` | Number | NUMBER | _null_ | ෯ Digit Nine |
| | | | |
|`U+0DF0` | _unassigned_ | | | |
|`U+0DF1` | _unassigned_ | | | |
|`U+0DF2` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ෲ Sign Vocalic Rr |
|`U+0DF3` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ෳ Sign Vocalic Ll |
|`U+0DF4` | Punctuation | _null_ | _null_ | ෴ Kunddaliya |
|`U+0DF5` | _unassigned_ | | | |
|`U+0DF6` | _unassigned_ | | | |
|`U+0DF7` | _unassigned_ | | | |
|`U+0DF8` | _unassigned_ | | | |
|`U+0DF9` | _unassigned_ | | | |
|`U+0DFA` | _unassigned_ | | | |
|`U+0DFB` | _unassigned_ | | | |
|`U+0DFC` | _unassigned_ | | | |
|`U+0DFD` | _unassigned_ | | | |
|`U+0DFE` | _unassigned_ | | | |
|`U+0DFF` | _unassigned_ | | | |
:::
## Sinhala Archaic Numbers character table ##
Sinhala text runs may also include glyphs from the Sinhala Archaic
Numbers block. These characters should be classified as follows.
:::{table} Sinhala Archaic Numbers character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+111E0` | _unassigned_ | | | |
|`U+111E1` | Number | NUMBER | _null_ | 𑇡 Archaic Digit One |
|`U+111E2` | Number | NUMBER | _null_ | 𑇢 Archaic Digit Two |
|`U+111E3` | Number | NUMBER | _null_ | 𑇣 Archaic Digit Three|
|`U+111E4` | Number | NUMBER | _null_ | 𑇤 Archaic Digit Four |
|`U+111E5` | Number | NUMBER | _null_ | 𑇥 Archaic Digit Five |
|`U+111E6` | Number | NUMBER | _null_ | 𑇦 Archaic Digit Six |
|`U+111E7` | Number | NUMBER | _null_ | 𑇧 Archaic Digit Seven|
|`U+111E8` | Number | NUMBER | _null_ | 𑇨 Archaic Digit Eight|
|`U+111E9` | Number | NUMBER | _null_ | 𑇩 Archaic Digit Nine |
|`U+111EA` | Number | NUMBER | _null_ | 𑇪 Archaic Number Ten |
|`U+111EB` | Number | NUMBER | _null_ | 𑇫 Archaic Number 20 |
|`U+111EC` | Number | NUMBER | _null_ | 𑇬 Archaic Number 30 |
|`U+111ED` | Number | NUMBER | _null_ | 𑇭 Archaic Number 40 |
|`U+111EE` | Number | NUMBER | _null_ | 𑇮 Archaic Number 50 |
|`U+111EF` | Number | NUMBER | _null_ | 𑇯 Archaic Number 60 |
| | | | |
|`U+111F0` | Number | NUMBER | _null_ | 𑇰 Archaic Number 70 |
|`U+111F1` | Number | NUMBER | _null_ | 𑇱 Archaic Number 80 |
|`U+111F2` | Number | NUMBER | _null_ | 𑇲 Archaic Number 90 |
|`U+111F3` | Number | NUMBER | _null_ | 𑇳 Archaic Number 100 |
|`U+111F4` | Number | NUMBER | _null_ | 𑇴 Archaic Number 1000|
|`U+111F5` | _unassigned_ | | | |
|`U+111F6` | _unassigned_ | | | |
|`U+111F7` | _unassigned_ | | | |
|`U+111F8` | _unassigned_ | | | |
|`U+111F9` | _unassigned_ | | | |
|`U+111FA` | _unassigned_ | | | |
|`U+111FB` | _unassigned_ | | | |
|`U+111FC` | _unassigned_ | | | |
|`U+111FD` | _unassigned_ | | | |
|`U+111FE` | _unassigned_ | | | |
|`U+111FF` | _unassigned_ | | | |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Sinhala script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Sinhala text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is used to request the subjoined form
of a consonant. The sequence "Consonant_1,Halant,ZWJ,Consonant_2" is
used to specify the subjoined form of "Consonant_2".
A secondary usage of the zero-width joiner is to explicitly request
the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should
produce a "Reph".
The zero-width non-joiner (ZWNJ) is not used in shaping runs of
Sinhala text. The ZWNJ is referenced below in various regular
expressions and shaping rules, however, because it is used by other
Indic scripts.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-syriac.md
================================================
# Syriac character tables #
This document lists the per-character shaping information needed to
[shape Syriac text](../opentype-shaping-syriac.md).
**Contents**
- [Syriac character table](#syriac-character-table)
- [Syriac Supplement character table](#syriac-supplement-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Syriac character table ##
Syriac glyphs should be classified as in the following
table. Codepoints in the Syriac block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
The _Joining type_ column indicates whether each codepoint is defined
as joining with adjacent characters on the left side, right side, left
and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints
designated TRANSPARENT in the _Joining type_ column do not join with
adjacent characters and, in addition, do not affect the joining
behavior of surrounding characters. Non-spacing marks are of type
TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent
characters to join.
The _Joining group_ column lists the fundamental letter that the
listed codepoint behaves like for joining purposes.
Assigned codepoints with a _null_ in the _Joining group_
column evoke no special behavior from the shaping engine during the
join-computation stage.
The _Mark class_ column indicates the Canonical Combining Class
for the codepoint. Marks are assigned non-zero combining classes so
that sequences of adjacent marks can be reordered as required by the
orthography.
For Syriac, a subset of marks in the 220 and 230 classes are also
designated _Modifier Combining Marks_ (MCM). These are denoted with
_220_MCM_ and _230_MCM_ in the _Mark class_ column. The MCM marks are
treated differently during the mark-reordering stage.
:::{table} Syriac character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+0700` | Punctuation | NON_JOINING | _null_ | _0_ | ܀ End of Paragraph |
|`U+0701` | Punctuation | NON_JOINING | _null_ | _0_ | ܁ Supralinear Full Stop |
|`U+0702` | Punctuation | NON_JOINING | _null_ | _0_ | ܂ Sublinear Full Stop |
|`U+0703` | Punctuation | NON_JOINING | _null_ | _0_ | ܃ Supralinear Colon |
|`U+0704` | Punctuation | NON_JOINING | _null_ | _0_ | ܄ Sublinear Colon |
|`U+0705` | Punctuation | NON_JOINING | _null_ | _0_ | ܅ Horizontal Colon |
|`U+0706` | Punctuation | NON_JOINING | _null_ | _0_ | ܆ Colon Skewed Left |
|`U+0707` | Punctuation | NON_JOINING | _null_ | _0_ | ܇ Colon Skewed Right |
|`U+0708` | Punctuation | NON_JOINING | _null_ | _0_ | ܈ Supralinear Colon Skewed Left |
|`U+0709` | Punctuation | NON_JOINING | _null_ | _0_ | ܉ Sublinear Colon Skewed Right |
|`U+070A` | Punctuation | NON_JOINING | _null_ | _0_ | ܊ Contraction |
|`U+070B` | Punctuation | NON_JOINING | _null_ | _0_ | ܋ Harklean Obelus |
|`U+070C` | Punctuation | NON_JOINING | _null_ | _0_ | ܌ Harklean Metobelus |
|`U+070D` | Punctuation | NON_JOINING | _null_ | _0_ | ܍ Harklean Asteriscus |
|`U+070E` | _unassigned_ | | | | |
|`U+070F` | Other | TRANSPARENT | _null_ | _0_ | Syriac Abbreviation Mark |
| | | | | |
|`U+0710` | Letter | RIGHT | ALAPH | _0_ | ܐ Alaph |
|`U+0711` | Mark [Mn] | TRANSPARENT | _null_ | 36 | ܑ Superscript Alaph |
|`U+0712` | Letter | DUAL | BETH | _0_ | ܒ Beth |
|`U+0713` | Letter | DUAL | GAMAL | _0_ | ܓ Gamal |
|`U+0714` | Letter | DUAL | GAMAL | _0_ | ܔ Gamal Garshuni |
|`U+0715` | Letter | RIGHT | DALATH_RISH | _0_ | ܕ Dalath |
|`U+0716` | Letter | RIGHT | DALATH_RISH | _0_ | ܖ Dotless Dalath Rish |
|`U+0717` | Letter | RIGHT | HE | _0_ | ܗ He |
|`U+0718` | Letter | RIGHT | SYRIAC_WAW | _0_ | ܘ Waw |
|`U+0719` | Letter | RIGHT | ZAIN | _0_ | ܙ Zain |
|`U+071A` | Letter | DUAL | HETH | _0_ | ܚ Heth |
|`U+071B` | Letter | DUAL | TETH | _0_ | ܛ Teth |
|`U+071C` | Letter | DUAL | TETH | _0_ | ܜ Teth Garshuni |
|`U+071D` | Letter | DUAL | YUDH | _0_ | ܝ Yudh |
|`U+071E` | Letter | RIGHT | YUDH_HE | _0_ | ܞ Yudh He |
|`U+071F` | Letter | DUAL | KAPH | _0_ | ܟ Kaph |
| | | | | |
|`U+0720` | Letter | DUAL | LAMADH | _0_ | ܠ Lamadh |
|`U+0721` | Letter | DUAL | MIM | _0_ | ܡ Mim |
|`U+0722` | Letter | DUAL | NUN | _0_ | ܢ Nun |
|`U+0723` | Letter | DUAL | SEMKATH | _0_ | ܣ Semkath |
|`U+0724` | Letter | DUAL | FINAL_SEMKATH | _0_ | ܤ Final Semkath |
|`U+0725` | Letter | DUAL | E | _0_ | ܥ E |
|`U+0727` | Letter | DUAL | PE | _0_ | ܧ Pe |
|`U+0727` | Letter | DUAL | REVERSED_PE | _0_ | ܧ Reversed Pe |
|`U+0728` | Letter | RIGHT | SADHE | _0_ | ܨ Sadhe |
|`U+0729` | Letter | DUAL | QAPH | _0_ | ܩ Qaph |
|`U+072A` | Letter | RIGHT | DALATH_RISH | _0_ | ܪ Rish |
|`U+072B` | Letter | DUAL | SHIN | _0_ | ܫ Shin |
|`U+072C` | Letter | RIGHT | TAW | _0_ | ܬ Taw |
|`U+072D` | Letter | DUAL | BETH | _0_ | ܭ Persian Bheth |
|`U+072E` | Letter | DUAL | GAMAL | _0_ | ܮ Persian Ghamal |
|`U+072F` | Letter | RIGHT | DALATH_RISH | _0_ | ܯ Persian Dhalath |
| | | | | |
|`U+0730` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܰ Pthaha Above |
|`U+0731` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܱ Pthaha Below |
|`U+0732` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܲ Pthaha Dotted |
|`U+0733` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܳ Zqapha Above |
|`U+0734` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܴ Zqapha Below |
|`U+0735` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܵ Zqapha Dotted |
|`U+0736` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܶ Rbasa Above |
|`U+0737` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܷ Rbasa Below |
|`U+0738` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܸ Dotted Zlama Horizontal |
|`U+0739` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܹ Dotted Zlama Angular |
|`U+073A` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܺ Hbasa Above |
|`U+073B` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܻ Hbasa Below |
|`U+073C` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܼ Hbasa-Esasa Dotted |
|`U+073D` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܽ Esasa Above |
|`U+073E` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ܾ Esasa Below |
|`U+073F` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ܿ Rwaha |
| | | | | |
|`U+0740` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݀ Feminine Dot |
|`U+0741` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݁ Qushshaya |
|`U+0742` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ݂ Rukkakha |
|`U+0743` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݃ Two Vertical Dots Above |
|`U+0744` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ݄ Two Vertical Dots Below |
|`U+0745` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݅ Three Dots Above |
|`U+0746` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ݆ Three Dots Below |
|`U+0747` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ݇ Oblique Line Above |
|`U+0748` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݈ Oblique Line Below |
|`U+0749` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݉ Music |
|`U+074A` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ݊ Barrekh |
|`U+074B` | _unassigned_ | | | | |
|`U+074C` | _unassigned_ | | | | |
|`U+074D` | Letter | RIGHT | ZHAIN | _0_ | ݍ Sogdian Zhain |
|`U+074E` | Letter | DUAL | KHAPH | _0_ | ݎ Sogdian Khaph |
|`U+074F` | Letter | DUAL | FE | _0_ | ݏ Sogdian Fe |
:::
## Syriac Supplement character table ##
The Syriac Supplement block includes letters needed to write Suriyani
Malayalam, also known as Garshuni or Syriac Malayalam.
:::{table} Syriac Supplement character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|-----------------------------------------------|
|`U+0860` | Letter | DUAL | MALAYALAM_NGA | _0_ | ࡠ Malayalam Nga |
|`U+0861` | Letter | NON_JOINING | MALAYALAM_JA | _0_ | ࡡ Malayalam Ja |
|`U+0862` | Letter | DUAL | MALAYALAM_NYA | _0_ | ࡢ Malayalam Nya |
|`U+0863` | Letter | DUAL | MALAYALAM_TTA | _0_ | ࡣ Malayalam Tta |
|`U+0864` | Letter | DUAL | MALAYALAM_NNA | _0_ | ࡤ Malayalam Nna |
|`U+0865` | Letter | DUAL | MALAYALAM_NNNA | _0_ | ࡥ Malayalam Nnna |
|`U+0866` | Letter | NON_JOINING | MALAYALAM_BHA | _0_ | ࡦ Malayalam Bha |
|`U+0867` | Letter | RIGHT | MALAYALAM_RA | _0_ | ࡧ Malayalam Ra |
|`U+0868` | Letter | DUAL | MALAYALAM_LLA | _0_ | ࡨ Malayalam Lla |
|`U+0869` | Letter | RIGHT | MALAYALAM_LLLA | _0_ | ࡩ Malayalam Llla |
|`U+086A` | Letter | RIGHT | MALAYALAM_SSA | _0_ | ࡪ Malayalam Ssa |
|`U+086B` | _unassigned_ | | | | |
|`U+086C` | _unassigned_ | | | | |
|`U+086D` | _unassigned_ | | | | |
|`U+086E` | _unassigned_ | | | | |
|`U+086F` | _unassigned_ | | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Syriac text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
combining mark in isolation. Real-world text syllables may also use
other characters, such as hyphens or dashes, in a similar placeholder
fashion; shaping engines should cope with this situation gracefully.
In addition, Syriac text runs may include the "Tatweel" or kashida
codepoint (`U+0640`) from the Arabic block, because the Syriac block
does not encode a separate kashida character.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:---------------------|:-----------|--------------------------------|
|`U+00A0` | Separator | NON_JOINING | _null_ | _0_ | No-break space |
|`U+034F` | Other | NON_JOINING | _null_ | _0_ | ͏ Combining grapheme joiner |
|`U+0640` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ـ Arabic Tatweel |
|`U+200C` | Other | NON_JOINING | _null_ | _0_ | Zero-width non-joiner |
|`U+200D` | Other | JOIN_CAUSING | _null_ | _0_ | Zero-width joiner |
|`U+200E` | Other | NON_JOINING | _null_ | _0_ | Left-to-Right marker |
|`U+200F` | Other | NON_JOINING | _null_ | _0_ | Right-to-Left marker |
|`U+2010` | Punctuation | NON_JOINING | _null_ | _0_ | ‐ Hyphen |
|`U+2011` | Punctuation | NON_JOINING | _null_ | _0_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | NON_JOINING | _null_ | _0_ | ‒ Figure dash |
|`U+2013` | Punctuation | NON_JOINING | _null_ | _0_ | – En dash |
|`U+2014` | Punctuation | NON_JOINING | _null_ | _0_ | — Em dash |
|`U+25CC` | Symbol | NON_JOINING | _null_ | _0_ | ◌ Dotted circle |
| | | | | | |
:::
The combining grapheme joiner (CGJ) is primarily used to alter the
order in which adjacent marks are positioned during the
mark-reordering stage, in order to adhere to the needs of a
non-default language orthography.
The zero-width joiner (ZWJ) is primarily used to force the usage of the
cursive connecting form of a letter even when the context of the
adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such
as for displaying it in a table of forms), the sequence "_Letter_,ZWJ"
would be used. To show the medial form of a letter in isolation, the
sequence "ZWJ,_Letter_,ZWJ" would be used.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by
the Unicode bidirectionality algorithm (BiDi) to indicate the points
in a text run at which the writing direction changes.
The no-break space is primarily used to display those codepoints that
are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
================================================
FILE: character-tables/character-tables-tamil.md
================================================
# Tamil character tables #
This document lists the per-character shaping information needed to
[shape Tamil text](../opentype-shaping-tamil.md).
**Contents**
- [Tamil character table](#tamil-character-table)
- [Tamil Supplement character table](#tamil-supplement-character-table)
- [Grantha marks character table](#grantha-marks-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Tamil character table ##
Tamil glyphs should be classified as in the following
table. Codepoints in the Tamil block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Tamil character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0B80` | _unassigned_ | | | |
|`U+0B81` | _unassigned_ | | | |
|`U+0B82` | Mark [Mn] | BINDU | TOP_POSITION | ஂ Anusvara |
|`U+0B83` | Letter | MODIFYING_LETTER | _null_ | ஃ Visarga |
|`U+0B84` | _unassigned_ | | | |
|`U+0B85` | Letter | VOWEL_INDEPENDENT | _null_ | அ A |
|`U+0B86` | Letter | VOWEL_INDEPENDENT | _null_ | ஆ Aa |
|`U+0B87` | Letter | VOWEL_INDEPENDENT | _null_ | இ I |
|`U+0B88` | Letter | VOWEL_INDEPENDENT | _null_ | ஈ Ii |
|`U+0B89` | Letter | VOWEL_INDEPENDENT | _null_ | உ U |
|`U+0B8A` | Letter | VOWEL_INDEPENDENT | _null_ | ஊ Uu |
|`U+0B8B` | _unassigned_ | | | |
|`U+0B8C` | _unassigned_ | | | |
|`U+0B8D` | _unassigned_ | | | |
|`U+0B8E` | Letter | VOWEL_INDEPENDENT | _null_ | எ E |
|`U+0B8F` | Letter | VOWEL_INDEPENDENT | _null_ | ஏ Ee |
| | | | |
|`U+0B90` | Letter | VOWEL_INDEPENDENT | _null_ | ஐ Ai |
|`U+0B91` | _unassigned_ | | | |
|`U+0B92` | Letter | VOWEL_INDEPENDENT | _null_ | ஒ O |
|`U+0B93` | Letter | VOWEL_INDEPENDENT | _null_ | ஓ Oo |
|`U+0B94` | Letter | VOWEL_INDEPENDENT | _null_ | ஔ Au |
|`U+0B95` | Letter | CONSONANT | _null_ | க Ka |
|`U+0B96` | _unassigned_ | | | |
|`U+0B97` | _unassigned_ | | | |
|`U+0B98` | _unassigned_ | | | |
|`U+0B99` | Letter | CONSONANT | _null_ | ங Nga |
|`U+0B9A` | Letter | CONSONANT | _null_ | ச Ca |
|`U+0B9B` | _unassigned_ | | | |
|`U+0B9C` | Letter | CONSONANT | _null_ | ஜ Ja |
|`U+0B9D` | _unassigned_ | | | |
|`U+0B9E` | Letter | CONSONANT | _null_ | ஞ Nya |
|`U+0B9F` | Letter | CONSONANT | _null_ | ட Tta |
| | | | |
|`U+0BA0` | _unassigned_ | | | |
|`U+0BA1` | _unassigned_ | | | |
|`U+0BA2` | _unassigned_ | | | |
|`U+0BA3` | Letter | CONSONANT | _null_ | ண Nna |
|`U+0BA4` | Letter | CONSONANT | _null_ | த Ta |
|`U+0BA5` | _unassigned_ | | | |
|`U+0BA6` | _unassigned_ | | | |
|`U+0BA7` | _unassigned_ | | | |
|`U+0BA8` | Letter | CONSONANT | _null_ | ந Na |
|`U+0BA9` | Letter | CONSONANT | _null_ | ன Nnna |
|`U+0BAA` | Letter | CONSONANT | _null_ | ப Pa |
|`U+0BAB` | _unassigned_ | | | |
|`U+0BAC` | _unassigned_ | | | |
|`U+0BAD` | _unassigned_ | | | |
|`U+0BAE` | Letter | CONSONANT | _null_ | ம Ma |
|`U+0BAF` | Letter | CONSONANT | _null_ | ய Ya |
| | | | |
|`U+0BB0` | Letter | CONSONANT | _null_ | ர Ra |
|`U+0BB1` | Letter | CONSONANT | _null_ | ற Rra |
|`U+0BB2` | Letter | CONSONANT | _null_ | ல La |
|`U+0BB3` | Letter | CONSONANT | _null_ | ள Lla |
|`U+0BB4` | Letter | CONSONANT | _null_ | ழ Llla |
|`U+0BB5` | Letter | CONSONANT | _null_ | வ Va |
|`U+0BB6` | Letter | CONSONANT | _null_ | ஶ Sha |
|`U+0BB7` | Letter | CONSONANT | _null_ | ஷ Ssa |
|`U+0BB8` | Letter | CONSONANT | _null_ | ஸ Sa |
|`U+0BB9` | Letter | CONSONANT | _null_ | ஹ Ha |
|`U+0BBA` | _unassigned_ | | | |
|`U+0BBB` | _unassigned_ | | | |
|`U+0BBC` | _unassigned_ | | | |
|`U+0BBD` | _unassigned_ | | | |
|`U+0BBE` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ா Sign Aa |
|`U+0BBF` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ி Sign I |
| | | | |
|`U+0BC0` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ீ Sign Ii |
|`U+0BC1` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ு Sign U |
|`U+0BC2` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ூ Sign Uu |
|`U+0BC3` | _unassigned_ | | | |
|`U+0BC4` | _unassigned_ | | | |
|`U+0BC5` | _unassigned_ | | | |
|`U+0BC6` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ெ Sign E |
|`U+0BC7` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ே Sign Ee |
|`U+0BC8` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ை Sign Ai |
|`U+0BC9` | _unassigned_ | | | |
|`U+0BCA` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ொ Sign O |
|`U+0BCB` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ோ Sign Oo |
|`U+0BCC` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ௌ Sign Au |
|`U+0BCD` | Mark [Mn] | VIRAMA | TOP_POSITION | ் Virama |
|`U+0BCE` | _unassigned_ | | | |
|`U+0BCF` | _unassigned_ | | | |
| | | | |
|`U+0BD0` | Letter | _null_ | _null_ | ௐ Om |
|`U+0BD1` | _unassigned_ | | | |
|`U+0BD2` | _unassigned_ | | | |
|`U+0BD3` | _unassigned_ | | | |
|`U+0BD4` | _unassigned_ | | | |
|`U+0BD5` | _unassigned_ | | | |
|`U+0BD6` | _unassigned_ | | | |
|`U+0BD7` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ௗ Au Length Mark |
|`U+0BD8` | _unassigned_ | | | |
|`U+0BD9` | _unassigned_ | | | |
|`U+0BDA` | _unassigned_ | | | |
|`U+0BDB` | _unassigned_ | | | |
|`U+0BDC` | _unassigned_ | | | |
|`U+0BDD` | _unassigned_ | | | |
|`U+0BDE` | _unassigned_ | | | |
|`U+0BDF` | _unassigned_ | | | |
| | | | |
|`U+0BE0` | _unassigned_ | | | |
|`U+0BE1` | _unassigned_ | | | |
|`U+0BE2` | _unassigned_ | | | |
|`U+0BE3` | _unassigned_ | | | |
|`U+0BE4` | _unassigned_ | | | |
|`U+0BE5` | _unassigned_ | | | |
|`U+0BE6` | Number | NUMBER | _null_ | ௦ Digit Zero |
|`U+0BE7` | Number | NUMBER | _null_ | ௧ Digit One |
|`U+0BE8` | Number | NUMBER | _null_ | ௨ Digit Two |
|`U+0BE9` | Number | NUMBER | _null_ | ௩ Digit Three |
|`U+0BEA` | Number | NUMBER | _null_ | ௪ Digit Four |
|`U+0BEB` | Number | NUMBER | _null_ | ௫ Digit Five |
|`U+0BEC` | Number | NUMBER | _null_ | ௬ Digit Six |
|`U+0BED` | Number | NUMBER | _null_ | ௭ Digit Seven |
|`U+0BEE` | Number | NUMBER | _null_ | ௮ Digit Eight |
|`U+0BEF` | Number | NUMBER | _null_ | ௯ Digit Nine |
| | | | |
|`U+0BF0` | Number | NUMBER | _null_ | ௰ Number Ten |
|`U+0BF1` | Number | NUMBER | _null_ | ௱ Number One Hundred |
|`U+0BF2` | Number | NUMBER | _null_ | ௲ Number One Thousand |
|`U+0BF3` | Symbol | SYMBOL | _null_ | ௳ Day Sign |
|`U+0BF4` | Symbol | SYMBOL | _null_ | ௴ Month Sign |
|`U+0BF5` | Symbol | SYMBOL | _null_ | ௵ Year Sign |
|`U+0BF6` | Symbol | SYMBOL | _null_ | ௶ Debit Sign |
|`U+0BF7` | Symbol | SYMBOL | _null_ | ௷ Credit Sign |
|`U+0BF8` | Symbol | SYMBOL | _null_ | ௸ As Above Sign |
|`U+0BF9` | Symbol | SYMBOL | _null_ | ௹ Tamil Rupee Sign |
|`U+0BFA` | Symbol | SYMBOL | _null_ | ௺ Number Sign |
|`U+0BFB` | _unassigned_ | | | |
|`U+0BFC` | _unassigned_ | | | |
|`U+0BFD` | _unassigned_ | | | |
|`U+0BFE` | _unassigned_ | | | |
|`U+0BFF` | _unassigned_ | | | |
:::
## Tamil Supplement character table ##
Tamil text runs may also include historical symbols and fractions from
the Tamil Supplement block. These characters should be classified as
follows.
:::{table} Tamil Supplement character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:--------------|:------------------------|:------------------------------|
| `U+11FC0` | Number | NUMBER | _null_ | 𑿀 Fraction One Three-Hundred-And-Twentieth |
| `U+11FC1` | Number | NUMBER | _null_ | 𑿁 Fraction One One-Hundred-And-Sixtieth |
| `U+11FC2` | Number | NUMBER | _null_ | 𑿂 Fraction One Eightieth |
| `U+11FC3` | Number | NUMBER | _null_ | 𑿃 Fraction One Sixty-Fourth |
| `U+11FC4` | Number | NUMBER | _null_ | 𑿄 Fraction One Fortieth |
| `U+11FC5` | Number | NUMBER | _null_ | 𑿅 Fraction One Thirty-Second |
| `U+11FC6` | Number | NUMBER | _null_ | 𑿆 Fraction Three Eightieths |
| `U+11FC7` | Number | NUMBER | _null_ | 𑿇 Fraction Three Sixty-Fourths |
| `U+11FC8` | Number | NUMBER | _null_ | 𑿈 Fraction One Twentieth |
| `U+11FC9` | Number | NUMBER | _null_ | 𑿉 Fraction One Sixteenth-1 |
| `U+11FCA` | Number | NUMBER | _null_ | 𑿊 Fraction One Sixteenth-2 |
| `U+11FCB` | Number | NUMBER | _null_ | 𑿋 Fraction One Tenth |
| `U+11FCC` | Number | NUMBER | _null_ | 𑿌 Fraction One Eighth |
| `U+11FCD` | Number | NUMBER | _null_ | 𑿍 Fraction Three Twentieths |
| `U+11FCE` | Number | NUMBER | _null_ | 𑿎 Fraction Three Sixteenths |
| `U+11FCF` | Number | NUMBER | _null_ | 𑿏 Fraction One Fifth |
| | | | |
| `U+11FD0` | Number | NUMBER | _null_ | 𑿐 Fraction One Quarter |
| `U+11FD1` | Number | NUMBER | _null_ | 𑿑 Fraction One Half-1 |
| `U+11FD2` | Number | NUMBER | _null_ | 𑿒 Fraction One Half-2 |
| `U+11FD3` | Number | NUMBER | _null_ | 𑿓 Fraction Three Quarters |
| `U+11FD4` | Number | NUMBER | _null_ | 𑿔 Fraction Downscaling Factor Kiizh |
| `U+11FD5` | Symbol | SYMBOL | _null_ | 𑿕 Sign Nel |
| `U+11FD6` | Symbol | SYMBOL | _null_ | 𑿖 Sign Cevitu |
| `U+11FD7` | Symbol | SYMBOL | _null_ | 𑿗 Sign Aazhaakku |
| `U+11FD8` | Symbol | SYMBOL | _null_ | 𑿘 Sign Uzhakku |
| `U+11FD9` | Symbol | SYMBOL | _null_ | 𑿙 Sign Muuvuzhakku |
| `U+11FDA` | Symbol | SYMBOL | _null_ | 𑿚 Sign Kuruni |
| `U+11FDB` | Symbol | SYMBOL | _null_ | 𑿛 Sign Pathakku |
| `U+11FDC` | Symbol | SYMBOL | _null_ | 𑿜 Sign Mukkuruni |
| `U+11FDD` | Symbol | SYMBOL | _null_ | 𑿝 Sign Kaacu |
| `U+11FDE` | Symbol | SYMBOL | _null_ | 𑿞 Sign Panam |
| `U+11FDF` | Symbol | SYMBOL | _null_ | 𑿟 Sign Pon |
| | | | |
| `U+11FE0` | Symbol | SYMBOL | _null_ | 𑿠 Sign Varaakan |
| `U+11FE1` | Symbol | SYMBOL | _null_ | 𑿡 Sign Paaram |
| `U+11FE2` | Symbol | SYMBOL | _null_ | 𑿢 Sign Kuzhi |
| `U+11FE3` | Symbol | SYMBOL | _null_ | 𑿣 Sign Veli |
| `U+11FE4` | Symbol | SYMBOL | _null_ | 𑿤 Wet Cultivation Sign |
| `U+11FE5` | Symbol | SYMBOL | _null_ | 𑿥 Dry Cultivation Sign |
| `U+11FE6` | Symbol | SYMBOL | _null_ | 𑿦 Land Sign |
| `U+11FE7` | Symbol | SYMBOL | _null_ | 𑿧 Salt Pan Sign |
| `U+11FE8` | Symbol | SYMBOL | _null_ | 𑿨 Traditional Credit Sign |
| `U+11FE9` | Symbol | SYMBOL | _null_ | 𑿩 Traditional Number Sign |
| `U+11FEA` | Symbol | SYMBOL | _null_ | 𑿪 Current Sign |
| `U+11FEB` | Symbol | SYMBOL | _null_ | 𑿫 And Odd Sign |
| `U+11FEC` | Symbol | SYMBOL | _null_ | 𑿬 Spent Sign |
| `U+11FED` | Symbol | SYMBOL | _null_ | 𑿭 Total Sign |
| `U+11FEE` | Symbol | SYMBOL | _null_ | 𑿮 In Possession Sign |
| `U+11FEF` | Symbol | SYMBOL | _null_ | 𑿯 Starting From Sign |
| | | | |
| `U+11FF0` | Symbol | SYMBOL | _null_ | 𑿰 Sign Muthaliya |
| `U+11FF1` | Symbol | SYMBOL | _null_ | 𑿱 Sign Vakaiyaraa |
| `U+11FF2` | _unassigned_ | | | |
| `U+11FF3` | _unassigned_ | | | |
| `U+11FF4` | _unassigned_ | | | |
| `U+11FF5` | _unassigned_ | | | |
| `U+11FF6` | _unassigned_ | | | |
| `U+11FF7` | _unassigned_ | | | |
| `U+11FF8` | _unassigned_ | | | |
| `U+11FF9` | _unassigned_ | | | |
| `U+11FFA` | _unassigned_ | | | |
| `U+11FFB` | _unassigned_ | | | |
| `U+11FFC` | _unassigned_ | | | |
| `U+11FFD` | _unassigned_ | | | |
| `U+11FFE` | _unassigned_ | | | |
| `U+11FFF` | Punctuation | _null_ | _null_ | 𑿿 End Of Text |
:::
## Grantha marks character table ##
Tamil text runs may also include diacritical and syllable-modifier
marks from the Grantha block. These characters should be classified as
follows.
:::{table} Grantha marks character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+11301` | Mark [Mn] | BINDU | TOP_POSITION | 𑌁 Grantha Candrabindu|
|`U+11303` | Mark [Mc] | VISARGA | RIGHT_POSITION | 𑌃 Grantha Visarga |
|`U+1133B` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌻 Combining Bindu Below |
|`U+1133C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌼 Grantha Nukta |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Tamil script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF3` | Mark [Mc] | VISARGA | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Tamil text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Tamil text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Tamil text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+00B2` | Number | SYLLABLE_MODIFIER | TOP | ² Superscript Two |
|`U+00B3` | Number | SYLLABLE_MODIFIER | TOP | ³ Superscript Three |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+2074` | Number | SYLLABLE_MODIFIER | TOP | ⁴ Superscript Four |
|`U+2082` | Number | SYLLABLE_MODIFIER | TOP | ₂ Subscript Two |
|`U+2083` | Number | SYLLABLE_MODIFIER | TOP | ₃ Subscript Three |
|`U+2084` | Number | SYLLABLE_MODIFIER | TOP | ₄ Subscript Four |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
Tamil text sometimes uses the Latin numerals 2, 3, and 4 in
superscript or subscript positions to annotate Sanskrit. When used in
this fashion, the superscripts and subscripts are treated as
`SYLLABLE_MODIFIER` signs for shaping purposes.
================================================
FILE: character-tables/character-tables-telugu.md
================================================
# Telugu character tables #
This document lists the per-character shaping information needed to
[shape Telugu text](../opentype-shaping-telugu.md).
**Contents**
- [Telugu character table](#telugu-character-table)
- [Vedic Extensions character table](#vedic-extensions-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Telugu character table ##
Telugu glyphs should be classified as in the following
table. Codepoints in the Telugu block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Telugu character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0C00` | Mark [Mn] | BINDU | TOP_POSITION | ఀ Combining Candrabindu Above |
|`U+0C01` | Mark [Mc] | BINDU | RIGHT_POSITION | ఁ Candrabindu |
|`U+0C02` | Mark [Mc] | BINDU | RIGHT_POSITION | ం Anusvara |
|`U+0C03` | Mark [Mc] | VISARGA | RIGHT_POSITION | ః Visarga |
|`U+0C04` | Mark [Mn] | BINDU | TOP_POSITION | ఄ Combining Anusvara Above |
|`U+0C05` | Letter | VOWEL_INDEPENDENT | _null_ | అ A |
|`U+0C06` | Letter | VOWEL_INDEPENDENT | _null_ | ఆ Aa |
|`U+0C07` | Letter | VOWEL_INDEPENDENT | _null_ | ఇ I |
|`U+0C08` | Letter | VOWEL_INDEPENDENT | _null_ | ఈ Ii |
|`U+0C09` | Letter | VOWEL_INDEPENDENT | _null_ | ఉ U |
|`U+0C0A` | Letter | VOWEL_INDEPENDENT | _null_ | ఊ Uu |
|`U+0C0B` | Letter | VOWEL_INDEPENDENT | _null_ | ఋ Vocalic R |
|`U+0C0C` | Letter | VOWEL_INDEPENDENT | _null_ | ఌ Vocalic L |
|`U+0C0D` | _unassigned_ | | | |
|`U+0C0E` | Letter | VOWEL_INDEPENDENT | _null_ | ఎ E |
|`U+0C0F` | Letter | VOWEL_INDEPENDENT | _null_ | ఏ Ee |
| | | | |
|`U+0C10` | Letter | VOWEL_INDEPENDENT | _null_ | ఐ Ai |
|`U+0C11` | _unassigned_ | | | |
|`U+0C12` | Letter | VOWEL_INDEPENDENT | _null_ | ఒ O |
|`U+0C13` | Letter | VOWEL_INDEPENDENT | _null_ | ఓ Oo |
|`U+0C14` | Letter | VOWEL_INDEPENDENT | _null_ | ఔ Au |
|`U+0C15` | Letter | CONSONANT | _null_ | క Ka |
|`U+0C16` | Letter | CONSONANT | _null_ | ఖ Kha |
|`U+0C17` | Letter | CONSONANT | _null_ | గ Ga |
|`U+0C18` | Letter | CONSONANT | _null_ | ఘ Gha |
|`U+0C19` | Letter | CONSONANT | _null_ | ఙ Nga |
|`U+0C1A` | Letter | CONSONANT | _null_ | చ Ca |
|`U+0C1B` | Letter | CONSONANT | _null_ | ఛ Cha |
|`U+0C1C` | Letter | CONSONANT | _null_ | జ Ja |
|`U+0C1D` | Letter | CONSONANT | _null_ | ఝ Jha |
|`U+0C1E` | Letter | CONSONANT | _null_ | ఞ Nya |
|`U+0C1F` | Letter | CONSONANT | _null_ | ట Tta |
| | | | |
|`U+0C20` | Letter | CONSONANT | _null_ | ఠ Ttha |
|`U+0C21` | Letter | CONSONANT | _null_ | డ Dda |
|`U+0C22` | Letter | CONSONANT | _null_ | ఢ Ddha |
|`U+0C23` | Letter | CONSONANT | _null_ | ణ Nna |
|`U+0C24` | Letter | CONSONANT | _null_ | త Ta |
|`U+0C25` | Letter | CONSONANT | _null_ | థ Tha |
|`U+0C26` | Letter | CONSONANT | _null_ | ద Da |
|`U+0C27` | Letter | CONSONANT | _null_ | ధ Dha |
|`U+0C28` | Letter | CONSONANT | _null_ | న Na |
|`U+0C29` | _unassigned_ | | | |
|`U+0C2A` | Letter | CONSONANT | _null_ | ప Pa |
|`U+0C2B` | Letter | CONSONANT | _null_ | ఫ Pha |
|`U+0C2C` | Letter | CONSONANT | _null_ | బ Ba |
|`U+0C2D` | Letter | CONSONANT | _null_ | భ Bha |
|`U+0C2E` | Letter | CONSONANT | _null_ | మ Ma |
|`U+0C2F` | Letter | CONSONANT | _null_ | య Ya |
| | | | |
|`U+0C30` | Letter | CONSONANT | _null_ | ర Ra |
|`U+0C31` | Letter | CONSONANT | _null_ | ఱ Rra |
|`U+0C32` | Letter | CONSONANT | _null_ | ల La |
|`U+0C33` | Letter | CONSONANT | _null_ | ళ Lla |
|`U+0C34` | Letter | CONSONANT | _null_ | ఴ Llla |
|`U+0C35` | Letter | CONSONANT | _null_ | వ Va |
|`U+0C36` | Letter | CONSONANT | _null_ | శ Sha |
|`U+0C37` | Letter | CONSONANT | _null_ | ష Ssa |
|`U+0C38` | Letter | CONSONANT | _null_ | స Sa |
|`U+0C39` | Letter | CONSONANT | _null_ | హ Ha |
|`U+0C3A` | _unassigned_ | | | |
|`U+0C3B` | _unassigned_ | | | |
|`U+0C3C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ఼ Nukta |
|`U+0C3D` | Letter | AVAGRAHA | _null_ | ఽ Avagraha |
|`U+0C3E` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ా Sign Aa |
|`U+0C3F` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ి Sign I |
| | | | |
|`U+0C40` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ీ Sign Ii |
|`U+0C41` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ు Sign U |
|`U+0C42` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ూ Sign Uu |
|`U+0C43` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ృ Sign Vocalic R |
|`U+0C44` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ౄ Sign Vocalic Rr |
|`U+0C45` | _unassigned_ | | | |
|`U+0C46` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ె Sign E |
|`U+0C47` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ే Sign Ee |
|`U+0C48` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ై Sign Ai |
|`U+0C49` | _unassigned_ | | | |
|`U+0C4A` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ొ Sign O |
|`U+0C4B` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ో Sign Oo |
|`U+0C4C` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ౌ Sign Au |
|`U+0C4D` | Mark [Mn] | VIRAMA | TOP_POSITION | ్ Virama |
|`U+0C4E` | _unassigned_ | | | |
|`U+0C4F` | _unassigned_ | | | |
| | | | |
|`U+0C50` | _unassigned_ | | | |
|`U+0C51` | _unassigned_ | | | |
|`U+0C52` | _unassigned_ | | | |
|`U+0C53` | _unassigned_ | | | |
|`U+0C54` | _unassigned_ | | | |
|`U+0C55` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ౕ Length Mark |
|`U+0C56` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ౖ Ai Length Mark |
|`U+0C57` | _unassigned_ | | | |
|`U+0C58` | Letter | CONSONANT | _null_ | ౘ Tsa |
|`U+0C59` | Letter | CONSONANT | _null_ | ౙ Dza |
|`U+0C5A` | Letter | CONSONANT | _null_ | ౚ Rrra |
|`U+0C5B` | _unassigned_ | | | |
|`U+0C5C` | _unassigned_ | | | |
|`U+0C5D` | Letter | CONSONANT_DEAD | _null_ | ౝ Nakaara Pollu |
|`U+0C5E` | _unassigned_ | | | |
|`U+0C5F` | _unassigned_ | | | |
| | | | |
|`U+0C60` | Letter | VOWEL_INDEPENDENT | _null_ | ౠ Vocalic Rr |
|`U+0C61` | Letter | VOWEL_INDEPENDENT | _null_ | ౡ Vocalic Ll |
|`U+0C62` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ౢ Sign Vocalic L |
|`U+0C63` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ౣ Sign Vocalic Ll |
|`U+0C64` | _unassigned_ | | | |
|`U+0C65` | _unassigned_ | | | |
|`U+0C66` | Number | NUMBER | _null_ | ౦ Digit Zero |
|`U+0C67` | Number | NUMBER | _null_ | ౧ Digit One |
|`U+0C68` | Number | NUMBER | _null_ | ౨ Digit Two |
|`U+0C69` | Number | NUMBER | _null_ | ౩ Digit Three |
|`U+0C6A` | Number | NUMBER | _null_ | ౪ Digit Four |
|`U+0C6B` | Number | NUMBER | _null_ | ౫ Digit Five |
|`U+0C6C` | Number | NUMBER | _null_ | ౬ Digit Six |
|`U+0C6D` | Number | NUMBER | _null_ | ౭ Digit Seven |
|`U+0C6E` | Number | NUMBER | _null_ | ౮ Digit Eight |
|`U+0C6F` | Number | NUMBER | _null_ | ౯ Digit Nine |
| | | | |
|`U+0C70` | _unassigned_ | | | |
|`U+0C71` | _unassigned_ | | | |
|`U+0C72` | _unassigned_ | | | |
|`U+0C73` | _unassigned_ | | | |
|`U+0C74` | _unassigned_ | | | |
|`U+0C75` | _unassigned_ | | | |
|`U+0C76` | _unassigned_ | | | |
|`U+0C77` | Punctuation | _null_ | _null_ | ౷ Sign Siddham |
|`U+0C78` | Number | NUMBER | _null_ | ౸ Fraction Zero Odd P |
|`U+0C79` | Number | NUMBER | _null_ | ౹ Fraction One Odd P |
|`U+0C7A` | Number | NUMBER | _null_ | ౺ Fraction Two Odd P |
|`U+0C7B` | Number | NUMBER | _null_ | ౻ Fraction Three Odd P|
|`U+0C7C` | Number | NUMBER | _null_ | ౼ Fraction One Even P |
|`U+0C7D` | Number | NUMBER | _null_ | ౽ Fraction Two Even P |
|`U+0C7E` | Number | NUMBER | _null_ | ౾ Fraction Three Even P|
|`U+0C7F` | Symbol | SYMBOL | _null_ | ౿ Tuumu |
:::
## Vedic Extensions character table ##
Sanskrit runs written in the Telugu script may also include
characters from the Vedic Extensions block. These characters should be
classified as follows.
> Note: See the [Vedic Extensions](../opentype-shaping-vedic-extensions.md)
> document for additional information.
:::{table} Vedic Extensions character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+1CD0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
|`U+1CD1` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
|`U+1CD2` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
|`U+1CD3` | Punctuation | _null_ | _null_ | ᳓ Sign Nihshvasa |
|`U+1CD4` | Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
|`U+1CD5` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
|`U+1CD6` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
|`U+1CD7` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
|`U+1CD8` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
|`U+1CD9` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
|`U+1CDA` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
|`U+1CDB` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
|`U+1CDC` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
|`U+1CDD` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
|`U+1CDE` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
|`U+1CDF` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
| | | | |
|`U+1CE0` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
|`U+1CE1` | Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
|`U+1CE2` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
|`U+1CE3` | Mark [Mn] | _null_ | OVERSTRUCK | ᳣ Sign Visarga Udatta |
|`U+1CE4` | Mark [Mn] | _null_ | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
|`U+1CE5` | Mark [Mn] | _null_ | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
|`U+1CE6` | Mark [Mn] | _null_ | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
|`U+1CE7` | Mark [Mn] | _null_ | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
|`U+1CE8` | Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
|`U+1CE9` | Letter | SYMBOL | _null_ | ᳩ Sign Anusvara Antargomukha |
|`U+1CEA` | Letter | _null_ | _null_ | ᳪ Sign Anusvara Bahirgomukha |
|`U+1CEB` | Letter | _null_ | _null_ | ᳫ Sign Anusvara Vamagomukha |
|`U+1CEC` | Letter | SYMBOL | _null_ | ᳬ Sign Anusvara Vamagomukha With Tail |
|`U+1CED` | Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
|`U+1CEE` | Letter | SYMBOL | _null_ | ᳮ Sign Hexiform Long Anusvara |
|`U+1CEF` | Letter | _null_ | _null_ | ᳯ Sign Long Anusvara |
| | | | |
|`U+1CF0` | Letter | _null_ | _null_ | ᳰ Sign Rthang Long Anusvara |
|`U+1CF1` | Letter | SYMBOL | _null_ | ᳱ Sign Anusvara Ubhayato Mukha |
|`U+1CF2` | Letter | CONSONANT_DEAD | _null_ | ᳲ Sign Ardhavisarga |
|`U+1CF3` | Letter | CONSONANT_DEAD | _null_ | ᳳ Sign Rotated Ardhavisarga |
|`U+1CF4` | Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
|`U+1CF5` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳵ Sign Jihvamuliya |
|`U+1CF6` | Letter | CONSONANT_WITH_STACKER | _null_ | ᳶ Sign Upadhmaniya |
|`U+1CF7` | Mark [Mc] | _null_ | _null_ | ᳷ Sign Atikrama |
|`U+1CF8` | Mark [Mn] | CANTILLATION | _null_ | ᳸ Tone Ring Above |
|`U+1CF9` | Mark [Mn] | CANTILLATION | _null_ | ᳹ Tone Double Ring Above |
|`U+1CFA` | Letter | PLACEHOLDER | _null_ | ᳺ Sign Double Anusvara Antargomukha |
|`U+1CFB` | _unassigned_ | | | |
|`U+1CFC` | _unassigned_ | | | |
|`U+1CFD` | _unassigned_ | | | |
|`U+1CFE` | _unassigned_ | | | |
|`U+1CFF` | _unassigned_ | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Telugu text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block. Telugu text can also incorporate the udatta
(`U+0951`) and anudatta (`U+0952`) signs from the Devanagari block.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+0951` | Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
|`U+0952` | Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
|`U+0964` | Punctuation | _null_ | _null_ | । Danda |
|`U+0965` | Punctuation | _null_ | _null_ | ॥ Double Danda |
:::
Other important characters that may be encountered when shaping runs
of Telugu text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence. The
sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the formation of
a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The
sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce the
first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/character-tables-thai.md
================================================
# Thai character tables #
This document lists the per-character shaping information needed to
[shape Thai text](../opentype-shaping-thai-lao.md#the-thailao-shaping-model).
**Contents**
- [Thai character table](#thai-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Thai character table ##
Thai glyphs should be classified as in the following
table. Codepoints in the Thai block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Thai character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Combining class | PUA | Glyph |
|:----------|:-----------------|:------------------|:------------------------|:----------------|:-------|:------------------------------|
|`U+0E00` | _unassigned_ | | | | | |
|`U+0E01` | Letter | CONSONANT | _null_ | _0_ | NC | ก Ko Kai |
|`U+0E02` | Letter | CONSONANT | _null_ | _0_ | NC | ข Kho Khai |
|`U+0E03` | Letter | CONSONANT | _null_ | _0_ | NC | ฃ Kho Khuat |
|`U+0E04` | Letter | CONSONANT | _null_ | _0_ | NC | ค Kho Khwai |
|`U+0E05` | Letter | CONSONANT | _null_ | _0_ | NC | ฅ Kho Khon |
|`U+0E06` | Letter | CONSONANT | _null_ | _0_ | NC | ฆ Kho Rakhang |
|`U+0E07` | Letter | CONSONANT | _null_ | _0_ | NC | ง Ngo Ngu |
|`U+0E08` | Letter | CONSONANT | _null_ | _0_ | NC | จ Cho Chan |
|`U+0E09` | Letter | CONSONANT | _null_ | _0_ | NC | ฉ Cho Ching |
|`U+0E0A` | Letter | CONSONANT | _null_ | _0_ | NC | ช Cho Chang |
|`U+0E0B` | Letter | CONSONANT | _null_ | _0_ | NC | ซ So So |
|`U+0E0C` | Letter | CONSONANT | _null_ | _0_ | NC | ฌ Cho Choe |
|`U+0E0D` | Letter | CONSONANT | _null_ | _0_ | RC | ญ Yo Ying |
|`U+0E0E` | Letter | CONSONANT | _null_ | _0_ | DC | ฎ Do Chada |
|`U+0E0F` | Letter | CONSONANT | _null_ | _0_ | DC | ฏ To Patak |
| | | | | | | |
|`U+0E10` | Letter | CONSONANT | _null_ | _0_ | RC | ฐ Tho Than |
|`U+0E11` | Letter | CONSONANT | _null_ | _0_ | NC | ฑ Tho Nangmontho |
|`U+0E12` | Letter | CONSONANT | _null_ | _0_ | NC | ฒ Tho Phuthao |
|`U+0E13` | Letter | CONSONANT | _null_ | _0_ | NC | ณ No Nen |
|`U+0E14` | Letter | CONSONANT | _null_ | _0_ | NC | ด Do Dek |
|`U+0E15` | Letter | CONSONANT | _null_ | _0_ | NC | ต To Tao |
|`U+0E16` | Letter | CONSONANT | _null_ | _0_ | NC | ถ Tho Thung |
|`U+0E17` | Letter | CONSONANT | _null_ | _0_ | NC | ท Tho Thahan |
|`U+0E18` | Letter | CONSONANT | _null_ | _0_ | NC | ธ Tho Thong |
|`U+0E19` | Letter | CONSONANT | _null_ | _0_ | NC | น No Nu |
|`U+0E1A` | Letter | CONSONANT | _null_ | _0_ | NC | บ Bo Baimai |
|`U+0E1B` | Letter | CONSONANT | _null_ | _0_ | AC | ป Po Pla |
|`U+0E1C` | Letter | CONSONANT | _null_ | _0_ | NC | ผ Pho Phung |
|`U+0E1D` | Letter | CONSONANT | _null_ | _0_ | AC | ฝ Fo Fa |
|`U+0E1E` | Letter | CONSONANT | _null_ | _0_ | NC | พ Pho Phan |
|`U+0E1F` | Letter | CONSONANT | _null_ | _0_ | AC | ฟ Fo Fan |
| | | | | | | |
|`U+0E20` | Letter | CONSONANT | _null_ | _0_ | NC | ภ Pho Samphao |
|`U+0E21` | Letter | CONSONANT | _null_ | _0_ | NC | ม Mo Ma |
|`U+0E22` | Letter | CONSONANT | _null_ | _0_ | NC | ย Yo Yak |
|`U+0E23` | Letter | CONSONANT | _null_ | _0_ | NC | ร Ro Rua |
|`U+0E24` | Letter | CONSONANT | _null_ | _0_ | NC | ฤ Ru |
|`U+0E25` | Letter | CONSONANT | _null_ | _0_ | NC | ล Lo Ling |
|`U+0E26` | Letter | CONSONANT | _null_ | _0_ | NC | ฦ Lu |
|`U+0E27` | Letter | CONSONANT | _null_ | _0_ | NC | ว Wo Waen |
|`U+0E28` | Letter | CONSONANT | _null_ | _0_ | NC | ศ So Sala |
|`U+0E29` | Letter | CONSONANT | _null_ | _0_ | NC | ษ So Rusi |
|`U+0E2A` | Letter | CONSONANT | _null_ | _0_ | NC | ส So Sua |
|`U+0E2B` | Letter | CONSONANT | _null_ | _0_ | NC | ห Ho Hip |
|`U+0E2C` | Letter | CONSONANT | _null_ | _0_ | NC | ฬ Lo Chula |
|`U+0E2D` | Letter | CONSONANT | _null_ | _0_ | NC | อ O Ang |
|`U+0E2E` | Letter | CONSONANT | _null_ | _0_ | NC | ฮ Ho Nokhuk |
|`U+0E2F` | Letter | CONSONANT | _null_ | _0_ | _null_ | ฯ Paiyannoi |
| | | | | | | |
|`U+0E30` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | ะ Sara A |
|`U+0E31` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ั Mai Han-akat |
|`U+0E32` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | า Sara Aa |
|`U+0E33` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | _null_ | ำ Sara Am |
|`U+0E34` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ิ Sara I |
|`U+0E35` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ี Sara Ii |
|`U+0E36` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ึ Sara Ue |
|`U+0E37` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ื Sara Uee |
|`U+0E38` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ุ Sara U |
|`U+0E39` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ู Sara Uu |
|`U+0E3A` | Mark [Mn] | PURE_KILLER | BOTTOM_POSITION | 9 | BV | ฺ Phinthu |
|`U+0E3B` | _unassigned_ | | | | | |
|`U+0E3C` | _unassigned_ | | | | | |
|`U+0E3D` | _unassigned_ | | | | | |
|`U+0E3E` | _unassigned_ | | | | | |
|`U+0E3F` | Symbol | SYMBOL | _null_ | _0_ | _null_ | ฿ Currency symbol Baht |
| | | | | | | |
|`U+0E40` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | เ Sara E |
|`U+0E41` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | แ Sara Ae |
|`U+0E42` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | โ Sara O |
|`U+0E43` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | ใ Sara Ai Maimuan |
|`U+0E44` | Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | _0_ | CV | ไ Sara Ai Maimalai |
|`U+0E45` | Letter | VOWEL_DEPENDENT | RIGHT_POSITION | _0_ | CV | ๅ Lakkhangyao |
|`U+0E46` | Letter Modifier | _null_ | _null_ | _0_ | _null_ | ๆ Maiyamok |
|`U+0E47` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | _0_ | AV | ็ Maitaikhu |
|`U+0E48` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ่ Mai Ek |
|`U+0E49` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ้ Mai Tho |
|`U+0E4A` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๊ Mai Tri |
|`U+0E4B` | Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๋ Mai Chattawa |
|`U+0E4C` | Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | _0_ | TV | ์ Thanthakhat |
|`U+0E4D` | Mark [Mn] | BINDU | TOP_POSITION | _0_ | AV | ํ Nikhahit |
|`U+0E4E` | Mark [Mn] | PURE_KILLER | TOP_POSITION | _0_ | AV | ๎ Yamakkan |
|`U+0E4F` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๏ Fongman |
| | | | | | | |
|`U+0E50` | Number | NUMBER | _null_ | _0_ | _null_ | ๐ Digit zero |
|`U+0E51` | Number | NUMBER | _null_ | _0_ | _null_ | ๑ Digit one |
|`U+0E52` | Number | NUMBER | _null_ | _0_ | _null_ | ๒ Digit two |
|`U+0E53` | Number | NUMBER | _null_ | _0_ | _null_ | ๓ Digit three |
|`U+0E54` | Number | NUMBER | _null_ | _0_ | _null_ | ๔ Digit four |
|`U+0E55` | Number | NUMBER | _null_ | _0_ | _null_ | ๕ Digit five |
|`U+0E56` | Number | NUMBER | _null_ | _0_ | _null_ | ๖ Digit six |
|`U+0E57` | Number | NUMBER | _null_ | _0_ | _null_ | ๗ Digit seven |
|`U+0E58` | Number | NUMBER | _null_ | _0_ | _null_ | ๘ Digit eight |
|`U+0E59` | Number | NUMBER | _null_ | _0_ | _null_ | ๙ Digit nine |
|`U+0E5A` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๚ Angkhankhu |
|`U+0E5B` | Punctuation | _null_ | _null_ | _0_ | _null_ | ๛ Khomut |
|`U+0E5C` | _unassigned_ | | | | | |
|`U+0E5D` | _unassigned_ | | | | | |
|`U+0E5E` | _unassigned_ | | | | | |
|`U+0E5F` | _unassigned_ | | | | | |
| | | | | | | |
|`U+0E60` | _unassigned_ | | | | | |
|`U+0E61` | _unassigned_ | | | | | |
|`U+0E62` | _unassigned_ | | | | | |
|`U+0E63` | _unassigned_ | | | | | |
|`U+0E64` | _unassigned_ | | | | | |
|`U+0E65` | _unassigned_ | | | | | |
|`U+0E66` | _unassigned_ | | | | | |
|`U+0E67` | _unassigned_ | | | | | |
|`U+0E68` | _unassigned_ | | | | | |
|`U+0E69` | _unassigned_ | | | | | |
|`U+0E6A` | _unassigned_ | | | | | |
|`U+0E6B` | _unassigned_ | | | | | |
|`U+0E6C` | _unassigned_ | | | | | |
|`U+0E6D` | _unassigned_ | | | | | |
|`U+0E6E` | _unassigned_ | | | | | |
|`U+0E6F` | _unassigned_ | | | | | |
| | | | | | | |
|`U+0E70` | _unassigned_ | | | | | |
|`U+0E71` | _unassigned_ | | | | | |
|`U+0E72` | _unassigned_ | | | | | |
|`U+0E73` | _unassigned_ | | | | | |
|`U+0E74` | _unassigned_ | | | | | |
|`U+0E75` | _unassigned_ | | | | | |
|`U+0E76` | _unassigned_ | | | | | |
|`U+0E77` | _unassigned_ | | | | | |
|`U+0E78` | _unassigned_ | | | | | |
|`U+0E79` | _unassigned_ | | | | | |
|`U+0E7A` | _unassigned_ | | | | | |
|`U+0E7B` | _unassigned_ | | | | | |
|`U+0E7C` | _unassigned_ | | | | | |
|`U+0E7D` | _unassigned_ | | | | | |
|`U+0E7E` | _unassigned_ | | | | | |
|`U+0E7F` | _unassigned_ | | | | | |
:::
## Miscellaneous character table ##
In addition to general punctuation, runs of Thai text often use the
combining macron below (`U+0331 `), combining tilde (`U+0303`), modifier letter
apostrophe (`U+02BC`), and modifier letter minus sign (`U+02D7`), from the
Combining Diacritical Marks block, particularly when used to write minority
languages.
In addition, Thai text typically does not insert spaces between words.
Consequently, the Zero-Width Space (`U+200B`) character is often used to insert
invisible break points that may be converted to line breaks.
:::{table} Additional punctuation character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+02BC` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ʼ Modifier apostrophe |
|`U+02D7` | Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ˗ Modifier minus sign |
|`U+0303` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ̃ Combining tilde |
|`U+0331` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ̱ Combining macron below|
|`U+200B` | Separator | PLACEHOLDER | _null_ | Zero-width space |
:::
Other important characters that may be encountered when shaping runs
of Thai text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+2010` | Punctuation | PLACEHOLDER | _null_ | ‐ Hyphen |
|`U+2011` | Punctuation | PLACEHOLDER | _null_ | ‑ No-break hyphen |
|`U+2012` | Punctuation | PLACEHOLDER | _null_ | ‒ Figure dash |
|`U+2013` | Punctuation | PLACEHOLDER | _null_ | – En dash |
|`U+2014` | Punctuation | PLACEHOLDER | _null_ | — Em dash |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
:::
================================================
FILE: character-tables/character-tables-tibetan.md
================================================
# Tibetan character tables #
This document lists the per-character shaping information needed to
[shape Tibetan text](../opentype-shaping-tibetan.md).
**Contents**
- [Tibetan character table](#tibetan-character-table)
- [Miscellaneous character table](#miscellaneous-character-table)
## Tibetan character table ##
Tibetan glyphs should be classified as in the following
table. Codepoints in the Tibetan block with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine. Note that
this does include some valid codepoints, such as currency marks,
punctuation, and other symbols.
> Note: the `NUMBER` and `SYMBOL` _Shaping classes_ are important
> during syllable identification, but generally evoke no further
> special behavior during the rest of the shaping process.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the following table use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
:::{table} Tibetan character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------------------------|
| `U+0F00` | Letter | _null_ | _null_ | ༀ Syllable Om |
| `U+0F01` | Symbol | SYMBOL | _null_ | ༁ Gter Yig Mgo Truncated A |
| `U+0F02` | Symbol | SYMBOL | _null_ | ༂ Gter Yig Mgo -Um Rnam Bcad Ma |
| `U+0F03` | Symbol | SYMBOL | _null_ | ༃ Gter Yig Mgo -Um Gter Tsheg Ma |
| `U+0F04` | Punctuation | _null_ | _null_ | ༄ Initial Yig Mgo Mdun Ma |
| `U+0F05` | Punctuation | _null_ | _null_ | ༅ Closing Yig Mgo Sgab Ma |
| `U+0F06` | Punctuation | _null_ | _null_ | ༆ Caret Yig Mgo Phur Shad Ma |
| `U+0F07` | Punctuation | _null_ | _null_ | ༇ Yig Mgo Tsheg Shad Ma |
| `U+0F08` | Punctuation | _null_ | _null_ | ༈ Sbrul Shad |
| `U+0F09` | Punctuation | _null_ | _null_ | ༉ Bskur Yig Mgo |
| `U+0F0A` | Punctuation | _null_ | _null_ | ༊ Bka- Shog Yig Mgo |
| `U+0F0B` | Punctuation | _null_ | _null_ | ་ Intersyllabic Tsheg |
| `U+0F0C` | Punctuation | _null_ | _null_ | ༌ Delimiter Tsheg Bstar |
| `U+0F0D` | Punctuation | _null_ | _null_ | ། Shad |
| `U+0F0E` | Punctuation | _null_ | _null_ | ༎ Nyis Shad |
| `U+0F0F` | Punctuation | _null_ | _null_ | ༏ Tsheg Shad |
| | | | | |
| `U+0F10` | Punctuation | _null_ | _null_ | ༐ Nyis Tsheg Shad |
| `U+0F11` | Punctuation | _null_ | _null_ | ༑ Rin Chen Spungs Shad |
| `U+0F12` | Punctuation | _null_ | _null_ | ༒ Rgya Gram Shad |
| `U+0F13` | Symbol | SYMBOL | _null_ | ༓ Caret -Dzud Rtags Me Long Can |
| `U+0F14` | Punctuation | _null_ | _null_ | ༔ Gter Tsheg |
| `U+0F15` | Symbol | SYMBOL | _null_ | ༕ Logotype Sign Chad Rtags |
| `U+0F16` | Symbol | SYMBOL | _null_ | ༖ Logotype Sign Lhag Rtags |
| `U+0F17` | Symbol | SYMBOL | _null_ | ༗ Astrological Sign Sgra Gcan -Char Rtags |
| `U+0F18` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ༘ Astrological Sign -Khyud Pa |
| `U+0F19` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ༙ Astrological Sign Sdong Tshugs |
| `U+0F1A` | Symbol | SYMBOL | _null_ | ༚ Sign Rdel Dkar Gcig |
| `U+0F1B` | Symbol | SYMBOL | _null_ | ༛ Sign Rdel Dkar Gnyis |
| `U+0F1C` | Symbol | SYMBOL | _null_ | ༜ Sign Rdel Dkar Gsum |
| `U+0F1D` | Symbol | SYMBOL | _null_ | ༝ Sign Rdel Nag Gcig |
| `U+0F1E` | Symbol | SYMBOL | _null_ | ༞ Sign Rdel Nag Gnyis |
| `U+0F1F` | Symbol | SYMBOL | _null_ | ༟ Sign Rdel Dkar Rdel Nag |
| | | | | |
| `U+0F20` | Number | NUMBER | _null_ | ༠ Digit Zero |
| `U+0F21` | Number | NUMBER | _null_ | ༡ Digit One |
| `U+0F22` | Number | NUMBER | _null_ | ༢ Digit Two |
| `U+0F23` | Number | NUMBER | _null_ | ༣ Digit Three |
| `U+0F24` | Number | NUMBER | _null_ | ༤ Digit Four |
| `U+0F25` | Number | NUMBER | _null_ | ༥ Digit Five |
| `U+0F26` | Number | NUMBER | _null_ | ༦ Digit Six |
| `U+0F27` | Number | NUMBER | _null_ | ༧ Digit Seven |
| `U+0F28` | Number | NUMBER | _null_ | ༨ Digit Eight |
| `U+0F29` | Number | NUMBER | _null_ | ༩ Digit Nine |
| `U+0F2A` | Number | NUMBER | _null_ | ༪ Digit Half One |
| `U+0F2B` | Number | NUMBER | _null_ | ༫ Digit Half Two |
| `U+0F2C` | Number | NUMBER | _null_ | ༬ Digit Half Three |
| `U+0F2D` | Number | NUMBER | _null_ | ༭ Digit Half Four |
| `U+0F2E` | Number | NUMBER | _null_ | ༮ Digit Half Five |
| `U+0F2F` | Number | NUMBER | _null_ | ༯ Digit Half Six |
| | | | | |
| `U+0F30` | Number | NUMBER | _null_ | ༰ Digit Half Seven |
| `U+0F31` | Number | NUMBER | _null_ | ༱ Digit Half Eight |
| `U+0F32` | Number | NUMBER | _null_ | ༲ Digit Half Nine |
| `U+0F33` | Number | NUMBER | _null_ | ༳ Digit Half Zero |
| `U+0F34` | Symbol | SYMBOL | _null_ | ༴ Bsdus Rtags |
| `U+0F35` | Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ༵ Ngas Bzung Nyi Zla |
| `U+0F36` | Symbol | SYMBOL | _null_ | ༶ Caret -Dzud Rtags Bzhi Mig Can |
| `U+0F37` | Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ༷ Ngas Bzung Sgor Rtags |
| `U+0F38` | Symbol | SYMBOL | _null_ | ༸ Che Mgo |
| `U+0F39` | Mark [Mn] | NUKTA | TOP_POSITION | ༹ Tsa -Phru |
| `U+0F3A` | Punctuation [Ps] | _null_ | _null_ | ༺ Gug Rtags Gyon |
| `U+0F3B` | Punctuation [Pe] | _null_ | _null_ | ༻ Gug Rtags Gyas |
| `U+0F3C` | Punctuation [Ps] | _null_ | _null_ | ༼ Ang Khang Gyon |
| `U+0F3D` | Punctuation [Pe] | _null_ | _null_ | ༽ Ang Khang Gyas |
| `U+0F3E` | Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ༾ Sign Yar Tshes |
| `U+0F3F` | Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ༿ Sign Mar Tshes |
| | | | | |
| `U+0F40` | Letter | CONSONANT | _null_ | ཀ Ka |
| `U+0F41` | Letter | CONSONANT | _null_ | ཁ Kha |
| `U+0F42` | Letter | CONSONANT | _null_ | ག Ga |
| `U+0F43` | Letter | CONSONANT | _null_ | གྷ Gha |
| `U+0F44` | Letter | CONSONANT | _null_ | ང Nga |
| `U+0F45` | Letter | CONSONANT | _null_ | ཅ Ca |
| `U+0F46` | Letter | CONSONANT | _null_ | ཆ Cha |
| `U+0F47` | Letter | CONSONANT | _null_ | ཇ Ja |
| `U+0F48` | _unassigned_ | | | |
| `U+0F49` | Letter | CONSONANT | _null_ | ཉ Nya |
| `U+0F4A` | Letter | CONSONANT | _null_ | ཊ Tta |
| `U+0F4B` | Letter | CONSONANT | _null_ | ཋ Ttha |
| `U+0F4C` | Letter | CONSONANT | _null_ | ཌ Dda |
| `U+0F4D` | Letter | CONSONANT | _null_ | ཌྷ Ddha |
| `U+0F4E` | Letter | CONSONANT | _null_ | ཎ Nna |
| `U+0F4F` | Letter | CONSONANT | _null_ | ཏ Ta |
| | | | | |
| `U+0F50` | Letter | CONSONANT | _null_ | ཐ Tha |
| `U+0F51` | Letter | CONSONANT | _null_ | ད Da |
| `U+0F52` | Letter | CONSONANT | _null_ | དྷ Dha |
| `U+0F53` | Letter | CONSONANT | _null_ | ན Na |
| `U+0F54` | Letter | CONSONANT | _null_ | པ Pa |
| `U+0F55` | Letter | CONSONANT | _null_ | ཕ Pha |
| `U+0F56` | Letter | CONSONANT | _null_ | བ Ba |
| `U+0F57` | Letter | CONSONANT | _null_ | བྷ Bha |
| `U+0F58` | Letter | CONSONANT | _null_ | མ Ma |
| `U+0F59` | Letter | CONSONANT | _null_ | ཙ Tsa |
| `U+0F5A` | Letter | CONSONANT | _null_ | ཚ Tsha |
| `U+0F5B` | Letter | CONSONANT | _null_ | ཛ Dza |
| `U+0F5C` | Letter | CONSONANT | _null_ | ཛྷ Dzha |
| `U+0F5D` | Letter | CONSONANT | _null_ | ཝ Wa |
| `U+0F5E` | Letter | CONSONANT | _null_ | ཞ Zha |
| `U+0F5F` | Letter | CONSONANT | _null_ | ཟ Za |
| | | | | |
| `U+0F60` | Letter | CONSONANT | _null_ | འ -A |
| `U+0F61` | Letter | CONSONANT | _null_ | ཡ Ya |
| `U+0F62` | Letter | CONSONANT | _null_ | ར Ra |
| `U+0F63` | Letter | CONSONANT | _null_ | ལ La |
| `U+0F64` | Letter | CONSONANT | _null_ | ཤ Sha |
| `U+0F65` | Letter | CONSONANT | _null_ | ཥ Ssa |
| `U+0F66` | Letter | CONSONANT | _null_ | ས Sa |
| `U+0F67` | Letter | CONSONANT | _null_ | ཧ Ha |
| `U+0F68` | Letter | CONSONANT | _null_ | ཨ A |
| `U+0F69` | Letter | CONSONANT | _null_ | ཀྵ Kssa |
| `U+0F6A` | Letter | CONSONANT | _null_ | ཪ Fixed-Form Ra |
| `U+0F6B` | Letter | CONSONANT | _null_ | ཫ Kka |
| `U+0F6C` | Letter | CONSONANT | _null_ | ཬ Rra |
| `U+0F6D` | _unassigned_ | | | |
| `U+0F6E` | _unassigned_ | | | |
| `U+0F6F` | _unassigned_ | | | |
| | | | | |
| `U+0F70` | _unassigned_ | | | |
| `U+0F71` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ཱ Sign Aa |
| `U+0F72` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ི Sign I |
| `U+0F73` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཱི Sign Ii |
| `U+0F74` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ུ Sign U |
| `U+0F75` | Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ཱུ Sign Uu |
| `U+0F76` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ྲྀ Sign Vocalic R |
| `U+0F77` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཷ Sign Vocalic Rr |
| `U+0F78` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ླྀ Sign Vocalic L |
| `U+0F79` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཹ Sign Vocalic Ll |
| `U+0F7A` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ེ Sign E |
| `U+0F7B` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ཻ Sign Ee |
| `U+0F7C` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ོ Sign O |
| `U+0F7D` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ཽ Sign Oo |
| `U+0F7E` | Mark [Mn] | BINDU | TOP_POSITION | ཾ Sign Rjes Su Nga Ro |
| `U+0F7F` | Mark [Mc] | VISARGA | RIGHT_POSITION | ཿ Sign Rnam Bcad |
| | | | | |
| `U+0F80` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ྀ Sign Reversed I |
| `U+0F81` | Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཱྀ Sign Reversed Ii |
| `U+0F82` | Mark [Mn] | BINDU | TOP_POSITION | ྂ Sign Nyi Zla Naa Da |
| `U+0F83` | Mark [Mn] | BINDU | TOP_POSITION | ྃ Sign Sna Ldan |
| `U+0F84` | Mark [Mn] | VIRAMA | BOTTOM_POSITION | ྄ Halanta |
| `U+0F85` | Punctuation | AVAGRAHA | _null_ | ྅ Paluta |
| `U+0F86` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ྆ Sign Lci Rtags |
| `U+0F87` | Mark [Mn] | TONE_MARKER | TOP_POSITION | ྇ Sign Yang Rtags |
| `U+0F88` | Letter | CONSONANT_HEAD | _null_ | ྈ Sign Lce Tsa Can |
| `U+0F89` | Letter | CONSONANT_HEAD | _null_ | ྉ Sign Mchu Can |
| `U+0F8A` | Letter | CONSONANT_HEAD | _null_ | ྊ Sign Gru Can Rgyings |
| `U+0F8B` | Letter | CONSONANT_HEAD | _null_ | ྋ Sign Gru Med Rgyings |
| `U+0F8C` | Letter | CONSONANT_HEAD | _null_ | ྌ Sign Inverted Mchu Can |
| `U+0F8D` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྍ Subjoined Sign Lce Tsa Can |
| `U+0F8E` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྎ Subjoined Sign Mchu Can |
| `U+0F8F` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྏ Subjoined Sign Inverted Mchu Can |
| | | | | |
| `U+0F90` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྐ Subjoined Ka |
| `U+0F91` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྑ Subjoined Kha |
| `U+0F92` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྒ Subjoined Ga |
| `U+0F93` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྒྷ Subjoined Gha |
| `U+0F94` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྔ Subjoined Nga |
| `U+0F95` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྕ Subjoined Ca |
| `U+0F96` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྖ Subjoined Cha |
| `U+0F97` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྗ Subjoined Ja |
| `U+0F98` | _unassigned_ | | | |
| `U+0F99` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྙ Subjoined Nya |
| `U+0F9A` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྚ Subjoined Tta |
| `U+0F9B` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྛ Subjoined Ttha |
| `U+0F9C` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྜ Subjoined Dda |
| `U+0F9D` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྜྷ Subjoined Ddha |
| `U+0F9E` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྞ Subjoined Nna |
| `U+0F9F` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྟ Subjoined Ta |
| | | | | |
| `U+0FA0` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྠ Subjoined Tha |
| `U+0FA1` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྡ Subjoined Da |
| `U+0FA2` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྡྷ Subjoined Dha |
| `U+0FA3` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྣ Subjoined Na |
| `U+0FA4` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྤ Subjoined Pa |
| `U+0FA5` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྥ Subjoined Pha |
| `U+0FA6` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྦ Subjoined Ba |
| `U+0FA7` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྦྷ Subjoined Bha |
| `U+0FA8` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྨ Subjoined Ma |
| `U+0FA9` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྩ Subjoined Tsa |
| `U+0FAA` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྪ Subjoined Tsha |
| `U+0FAB` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྫ Subjoined Dza |
| `U+0FAC` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྫྷ Subjoined Dzha |
| `U+0FAD` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྭ Subjoined Wa |
| `U+0FAE` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྮ Subjoined Zha |
| `U+0FAF` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྯ Subjoined Za |
| | | | | |
| `U+0FB0` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྰ Subjoined -A |
| `U+0FB1` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྱ Subjoined Ya |
| `U+0FB2` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྲ Subjoined Ra |
| `U+0FB3` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ླ Subjoined La |
| `U+0FB4` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྴ Subjoined Sha |
| `U+0FB5` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྵ Subjoined Ssa |
| `U+0FB6` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྶ Subjoined Sa |
| `U+0FB7` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྷ Subjoined Ha |
| `U+0FB8` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྸ Subjoined A |
| `U+0FB9` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྐྵ Subjoined Kssa |
| `U+0FBA` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྺ Subjoined Fixed-Form Wa |
| `U+0FBB` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྻ Subjoined Fixed-Form Ya |
| `U+0FBC` | Mark [Mn] |CONSONANT_SUBJOINED| BOTTOM_POSITION | ྼ Subjoined Fixed-Form Ra |
| `U+0FBD` | _unassigned_ | | | |
| `U+0FBE` | Symbol | SYMBOL | _null_ | ྾ Ku Ru Kha |
| `U+0FBF` | Symbol | SYMBOL | _null_ | ྿ Ku Ru Kha Bzhi Mig Can |
| | | | | |
| `U+0FC0` | Symbol | SYMBOL | _null_ | ࿀ Cantillation Sign Heavy Beat |
| `U+0FC1` | Symbol | SYMBOL | _null_ | ࿁ Cantillation Sign Light Beat |
| `U+0FC2` | Symbol | SYMBOL | _null_ | ࿂ Cantillation Sign Cang Te-U |
| `U+0FC3` | Symbol | SYMBOL | _null_ | ࿃ Cantillation Sign Sbub -Chal |
| `U+0FC4` | Symbol | SYMBOL | _null_ | ࿄ Symbol Dril Bu |
| `U+0FC5` | Symbol | SYMBOL | _null_ | ࿅ Symbol Rdo Rje |
| `U+0FC6` | Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ࿆ Symbol Padma Gdan |
| `U+0FC7` | Symbol | SYMBOL | _null_ | ࿇ Symbol Rdo Rje Rgya Gram |
| `U+0FC8` | Symbol | SYMBOL | _null_ | ࿈ Symbol Phur Pa |
| `U+0FC9` | Symbol | SYMBOL | _null_ | ࿉ Symbol Nor Bu |
| `U+0FCA` | Symbol | SYMBOL | _null_ | ࿊ Symbol Nor Bu Nyis -Khyil |
| `U+0FCB` | Symbol | SYMBOL | _null_ | ࿋ Symbol Nor Bu Gsum -Khyil |
| `U+0FCC` | Symbol | SYMBOL | _null_ | ࿌ Symbol Nor Bu Bzhi -Khyil |
| `U+0FCD` | _unassigned_ | | | |
| `U+0FCE` | Symbol | SYMBOL | _null_ | ࿎ Sign Rdel Nag Rdel Dkar |
| `U+0FCF` | Symbol | SYMBOL | _null_ | ࿏ Sign Rdel Nag Gsum |
| | | | | |
| `U+0FD0` | Punctuation | _null_ | _null_ | ࿐ Bska- Shog Gi Mgo Rgyan |
| `U+0FD1` | Punctuation | _null_ | _null_ | ࿑ Mnyam Yig Gi Mgo Rgyan |
| `U+0FD2` | Punctuation | _null_ | _null_ | ࿒ Nyis Tsheg |
| `U+0FD3` | Punctuation | _null_ | _null_ | ࿓ Initial Brda Rnying Yig Mgo Mdun |
| `U+0FD4` | Punctuation | _null_ | _null_ | ࿔ Closing Brda Rnying Yig Mgo Sgab |
| `U+0FD5` | Symbol | SYMBOL | _null_ | ࿕ Right-Facing Svasti Sign |
| `U+0FD6` | Symbol | SYMBOL | _null_ | ࿖ Left-Facing Svasti Sign |
| `U+0FD7` | Symbol | SYMBOL | _null_ | ࿗ Right-Facing Svasti Sign With Dots |
| `U+0FD8` | Symbol | SYMBOL | _null_ | ࿘ Left-Facing Svasti Sign With Dots |
| `U+0FD9` | Punctuation | _null_ | _null_ | ࿙ Leading Mchan Rtags |
| `U+0FDA` | Punctuation | _null_ | _null_ | ࿚ Trailing Mchan Rtags |
| `U+0FDB` | _unassigned_ | | | |
| `U+0FDC` | _unassigned_ | | | |
| `U+0FDD` | _unassigned_ | | | |
| `U+0FDE` | _unassigned_ | | | |
| `U+0FDF` | _unassigned_ | | | |
| | | | | |
:::
## Miscellaneous character table ##
Other important characters that may be encountered when shaping runs
of Tibetan text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
:::{table} Miscellaneous character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-------------------------------|
|`U+00A0` | Separator | PLACEHOLDER | _null_ | No-break space |
|`U+200C` | Other | NON_JOINER | _null_ | Zero-width non-joiner |
|`U+200D` | Other | JOINER | _null_ | Zero-width joiner |
|`U+25CC` | Symbol | DOTTED_CIRCLE | _null_ | ◌ Dotted circle |
|`U+2638` | Symbol | SYMBOL | _null_ | ☸ Wheel of Dharma |
:::
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct
from a "_consonant_,Halant,_consonant_" sequence. The sequence
"_consonant_,Halant,ZWJ,_consonant_" blocks the formation of a
conjunct between the two consonants.
Note, however, that the "_consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence
"_consonant_,Halant,ZWNJ,_consonant_" should produce the first
consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
where an initial "Ra,Halant" sequence without the zero-width joiner
otherwise would.
The no-break space (NBSP) is primarily used to display
those codepoints that are defined as non-spacing (marks, dependent
vowels (matras), below-base consonant forms, and post-base consonant
forms) in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_consonant_", "NBSP,_mark_", or "NBSP,_matra_".
================================================
FILE: character-tables/index.md
================================================
# Character tables #
The section includes per-srcipt reference tables showing the
shaping-related properties of the codepoints used for each script,
as well as auxiliary information about the codepoints and notes on
control characters and other special-purpose codepoints that could
prove relevant to shapinh-engine implementers.
- Indic
- [Devanagari](character-tables-devanagari.md)
- [Bengali](character-tables-bengali.md)
- [Gujarati](character-tables-gujarati.md)
- [Gurmukhi](character-tables-gurmukhi.md)
- [Kannada](character-tables-kannada.md)
- [Malayalam](character-tables-malayalam.md)
- [Oriya](character-tables-oriya.md)
- [Tamil](character-tables-tamil.md)
- [Telugu](character-tables-telugu.md)
- [Sinhala](character-tables-sinhala.md)
- _Vedic Extensions tables are included in each Indic script_
- Arabic
- [Arabic](character-tables-arabic.md)
- [Syriac](character-tables-syriac.md)
- [N'Ko](character-tables-nko.md)
- [Mongolian](character-tables-mongolian.md)
- Hangul
- [Hangul Jamo](character-tables-hangul.md)
- Hebrew
- [Hebrew](character-tables-hebrew.md)
- Khmer
- [Khmer](character-tables-khmer.md)
- Lao
- [Lao](character-tables-lao.md)
- Myanmar
- [Myanmar](character-tables-myanmar.md)
- Thai
- [Thai](character-tables-thai.md)
- Tibetan
- [Tibetan](character-tables-tibetan.md)
:::{note}
Tables are not provided for the default or Universal Shaping Engine
(USE) shaping documents, each of which covers a
multitude of individual scripts, nor for the emoji shaping document,
because emoji usage is not specific to any individual script.
:::
================================================
FILE: conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import sys
from pathlib import Path
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
project = 'OpenType Shaping Documents'
copyright = '2022, Sponsored by YesLogic'
author = 'Sponsored by YesLogic'
version = "0.9"
release = "0.9alpha1"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
sys.path.append(str(Path('_ext').resolve()))
extensions = ['myst_parser', 'sphinx_external_toc', 'sphinx_inline_svg', 'shapingdocs_svg_color_toggles']
source_suffix = {'.md': 'markdown'}
templates_path = ['_templates']
exclude_patterns = ['_build', '_ext', 'test', 'Thumbs.db', '.DS_Store', 'BUILD.md', 'README.md', '**-image-generation-log.md', 'character-tables/README.md', 'images/images-index.md', 'images/README.md', 'notes/README.md'] # Eventually need to remove the links to image-generation-logs from the root README.md
root_doc = 'index' # Renamed to split GitHub README from production index
numfig = True
numfig_secnum_depth = 2
myst_heading_anchors = 6
# attrs_inline to specify HTML element attributes like img 'title' that are getting lost on build.
myst_enable_extensions = ['substitution', 'smartquotes', 'colon_fence', 'attrs_inline']
myst_substitutions = {
'opentogglebutton': '
',
'khmer_midsyllable_mark_table_workaround': 'Mid-syllable marks that must be tagged for sorting with above-base consonants',
}
external_toc_path = "_toc.yml"
# Starting with sphinx_external_toc 1.1.0, "multitoc numbering" is activated
# by this configuration key and the standalone sphinx_multitoc_numbering
# extension is not required
use_multitoc_numbering = True
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = 'alabaster'
html_static_path = ['_static']
html_js_files = ['toggleSvgColors.js']
html_sidebars = {
'**': [
'about.html',
'static_nav.html',
# 'navigation.html', # Replacing default navigation with static version above
'searchbox.html',
'sourcelink.html',
]
}
html_theme_options = {
'page_width': '1200px',
'sidebar_width': '300px',
'github_user': 'n8willis',
'github_repo': 'opentype-shaping-documents',
'font_family': 'Source\ Serif\ 4',
'head_font_family': 'Source\ Serif\ 4',
'caption_font_family': 'Source\ Serif\ 4',
'code_font_family': 'Source\ Code\ Pro',
'github_button': True,
'github_type': 'watch',
'github_count': True,
'extra_nav_links': {
'GitHub issues': 'https://github.com/n8willis/opentype-shaping-documents/issues',
'Build process': 'https://github.com/n8willis/opentype-shaping-documents/blob/master/BUILD.md', # Fix the directory path after PR merge; Add contributor-guide link
}
}
================================================
FILE: errata.md
================================================
# OpenType shaping errata #
This document details errata that shaping engines may encounter, such
as ambiguities or omissions in the existing OpenType or Unicode
specification documents.
**Contents**
- [Unicode](#unicode)
- [ZWJ and ZWNJ](#zwj-and-zwnj)
- [Scope of ZWJ and ZWNJ](#scope-of-zwj-and-zwnj)
- [ZWJ in redundant ligature lookups](#zwj-in-redundant-ligature-lookups)
- [Emoji](#emoji)
- [Skin-tone permutations](#skin-tone-permutations)
- [Gender permutations](#gender-permutations)
- [OpenType](#opentype)
- [Null offsets in GSUB and GPOS](#null-offsets-in-gsub-and-gpos)
- [Sorting of GSUB and GPOS lookups](#sorting-of-gsub-and-gpos-lookups)
- [Per-script applicability of feature tags](#per-script-applicability-of-feature-tags)
- [Ordering of post-base and below-base consonants in Indic2 base-consonant determination](#ordering-of-post-base-and-below-base-consonants-in-indic2-base-consonant-determination)
- [Lookup behavior](#lookup-behavior)
- [Using MultipleSub for glyph deletion](#using-multiplesub-for-glyph-deletion)
- [Processing nested contextual lookups](#processing-nested-contextual-lookups)
- [Adjacent-mark reordering ambiguities](#adjacent-mark-reordering-ambiguities)
- [Merging of glyph properties](#merging-of-glyph-properties)
- [See also](#see-also)
## Unicode ##
This section lists errata pertaining to the Unicode Standard.
### ZWJ and ZWNJ ###
#### Scope of ZWJ and ZWNJ ####
Unicode provides the Zero Width Joiner (ZWJ) and Zero Width Non-Joiner
(ZWNJ) control characters so that a text sequence can "request a
rendering system to have more or less of a connection between
characters than they would otherwise have."
The generic examples used in the standard show how ZWJ and ZWNJ
characters can affect the cursive-joining behavior between two
characters or the ligature-forming behavior between two
characters. However, the standard does not explicitly say whether or
not the presence of a ZWJ or ZWNJ should influence the shaping
behavior of characters for characters not adjacent to the ZWJ or ZWNJ.
For example, in the sequence "a,b,ZWNJ,c,d" the ZWNJ should prevent
the application of a ligature between "b" and "c" (if such a ligature
lookup exists in the active font).
However, if the active font contains a contextual ligature lookup for
"c,d" when preceded by "b", it is not clear whether or not the ZWNJ
in the same "a,b,ZWNJ,c,d" sequence should inhibit the application of
the ligature between "c" and "d".
#### ZWJ in redundant ligature lookups ####
An "Implementation Notes" section in chapter 23.2 of the Unicode
Standard says that font vendors should add ZWJ sequences to ligature
lookups. For example, if the sequence "f,i" triggers the "fi"
ligature, then the font should also include a lookup that triggers the
"fi" ligature for "f,ZWJ,i".
However, the text of chapter 23.2 prior to the "Implementation Notes"
says that ZWJ and ZWNJ "are not to be used in all cases where
ligatures or cursive connections are desired; instead, they are meant
only for over-riding the normal behavior of the text." That logic
makes the suggested "f,ZWJ,i" ligature lookup superfluous, because it
duplicates the effects of the existing "f,i" ligature lookup.
Using ZWJ within lookup patterns in the manner suggested by the
"Implementation Notes" is not common practice.
### Emoji ###
#### Skin-tone permutations ####
It is unclear whether ZWJ multi-person group emoji sequences are
expected to include combinations where some emoji in the sequence are
followed by a Fitzpatrick skin-tone modifier but other emoji in the
sequence are not followed by a Fitzpatrick skin-tone modifier.
For example, it is unclear whether the sequence
"Man,ZWJ,Handshake,Man,SkinTone-2" constitues a valid
ZWJ "Couple holding hands" sequence.
#### Gender permutations ####
It is unclear whether ZWJ multi-person group emoji sequences are
expected to include combinations where some emoji in the sequence are
are an explicit gender but other emoji in the sequence are not
explicit gender.
For example, it is unclear whether the sequence
"Man,ZWJ,Handshake,Person" constitues a valid
ZWJ "Couple holding hands" sequence.
It is also unclear whether the ZWJ multi-person family sequence must
have explicit gender-ordering for the adult humans depicted.
For example, it is unclear whether the sequence
"Man,ZWJ,Woman,ZWJ,Girl" should be rendered identically to the
sequence "Woman,ZWJ,Man,ZWJ,Girl".
## OpenType ##
This section lists errata pertaining to the OpenType specification.
### Null offsets in GSUB and GPOS ###
The headers of the GSUB and GPOS tables include fields that contain
the offsets at which other structures within the font binary are
found. For example, the value of the `featureVariationsOffset` field
indicates the byte value at which the featureVariations structure is
located.
The OpenType specification notes that `featureVariationsOffset` can be
`NULL`, but the specification does not indicate whether or any other
offset values can also be `NULL` (nor, conversely, does it indicate
whether `NULL` should be considered invalid).
In practice, other fields -- such as `scriptListOffset`,
`featureListOffset`, and `lookupListOffset` -- may have `NULL` values.
In such situations, `NULL` is usually intrepreted as meaning that the
structure nominally pointed to by the offset is empty.
Furthermore, font-validation functions may overwrite a `NULL` into an
offset field if the original value encountered was invalid.
### Sorting of GSUB and GPOS lookups ###
The OpenType specification requires that lookups in the GSUB table
must be sorted into numeric order before they are applied.
Lookups in the GPOS table, however, are not expected to be sorted
first, because GPOS lookups are applied in a specified order.
### Per-script applicability of feature tags ###
Some OpenType feature tags are defined only to apply to text runs in
specific scripts. Other feature tags are defined to apply to text in
any script.
However, the definitions of some feature tags list a limited number of
example scripts to which the feature should apply, but do not specify
every supported script.
For example, the `pstf` (post-base forms) tag is
[described](https://docs.microsoft.com/en-us/typography/opentype/spec/features_pt#tag-pstf)
as required for "scripts of south and southeast Asia that have
post-base forms for consonants eg: Gurmukhi, Malayalam, Khmer."
### Ordering of post-base and below-base consonants in Indic2 base-consonant determination ###
The Microsoft script-development specification for all Indic2-model
scripts
[states](https://docs.microsoft.com/en-us/typography/script-development/bengali#reorder-characters)
parenthetically that "post-base forms have to follow below-base forms".
If this statement is taken to be a rule, it would affect the
base-consonant search algorithm.
For example, in the Bengali sequence "Ka,Halant,Ba,Halant,Ya"
(`U+0995`,`U+09CD`,`U+09AC`,`U+09CD`,`U+09AF`), "Ka" would be
identified as the syllable base, with "Ba" designated a below-base
form and "Ya" designated a post-base form. However, in the similar
sequence "Ka,Halant,Ya,Halant,Ba"
(`U+0995`,`U+09CD`,`U+09AF`,`U+09CD`,`U+09AC`), "Ya" would be
identified as the base consonant.
Real-world Bengali texts provide counterexamples that contradict the
assumption that "post-base forms follow below-base forms" is a
requirement.
In other scripts, such as Telugu, the "post-base forms have to follow
below-base forms" statement is, perhaps, statistically likely, but is
certainly not an orthographic rule.
Consequently, it is unclear if the statement should be enforced as a
rule or if it should be regarded as a suggestion, and it is unclear to
what degree that answer varies between the Indic2-model scripts.
### Lookup behavior ###
#### Using MultipleSub for glyph deletion ####
The GSUB specification says that a `MultipleSubst` substitution cannot
be used to delete a glyph: it always substitutes at least one
replacement glyph. However, some implementations allow the
replacement-glyph array to be zero-length.
#### Processing nested contextual lookups ####
The GSUB specification allows contextual substitutions to invoke other
contextual substitutions. It is unclear how implementations ought to
handle certain cases of these nested lookups.
For example:
```
context: 'a'
subst index 0:
context: 'ab'
subst index 1: 'b' → 'ab'
```
This nested set of substitutions could cause an infinite loop on
certain input strings, if it is interpreted in a naive manner:
```
'[]ab' // begin at start of glyph sequence
'[a]b' // context matches
'[ab]' // nested context matches at index 0
'[aab]' // subst applies at index 1
'[a]ab' // return to parent context, uh oh!
'a[]ab' // move on to next glyph
'a[a]b' // context matches, infinite loop!
```
In short, if a nested contextual substitution can insert glyphs ahead
of its parent contextual substitution's context, then it creates a
"stack" that allows Turing-complete computation.
### Adjacent-mark reordering ambiguities ###
The Microsoft script-development specifications
[say](https://docs.microsoft.com/en-us/typography/script-development/devanagari#reorder-characters)
that marks should be reordered "to canonical order" (step 3 in the
linked Devanagari document) in the reordering phase. However, the same
step also describes this step as "Adjacent nukta and halant or nukta
and Vedic sign are always repositioned if necessary, so that the nukta
is first."
Together, it is somewhat ambiguous as to whether only "Halant,Nukta"
and "_Vedic_sign_,Nukta" sequences should be reordered by moving the
"Nukta" to the beginning, or all sequences of marks require reordering
into Unicode canonical combining class order, with "Nukta" moving to
the initial position as a special case.
### Merging of glyph properties ###
When the application of a shaping operation merges two or more
adjacent glyphs (for example, when two adjacent glyphs are substituted
with a single ligature glyph), the OpenType specification does not
dictate how shaping engines should combine (for example, merge,
replace, or drop) the properties of the input glyphs to determine the
properties of the output glyph.
This may result in ambiguities when a sequence of glyphs has several
substitutions applied in series.
For example, when shaping Indic scripts, glyphs may be tagged for the
possible application of multiple features, such as `half` and `rkrf`,
which are applied serially.
HarfBuzz and Uniscribe both take the approach of retaining the
properties of the first input glyph in a sequence, propagating those
properties to the merged output glyph.
## See also ##
Shaping engines may also want to offer explicit compatibility with
Microsoft Uniscribe, for the purpose of ensuring that users' existing
documents do not break. Therefore, implementors may wish to consult
the [Uniscribe compatibility notes](notes/uniscribe-bug-compatibility.md).
These compatibilty notes record test-driven observations about
Uniscribe's behavior, and they include any behavior that is a known
bug or a known deviation from specifications. Consequently, the issues
raised by offering Uniscribe compatiblity cannot be considered errata
in the sense that it is described above.
================================================
FILE: images/arabic/arabic-png-image-generation-log.md
================================================
# Commands used to generate the PNG images in [opentype-shaping-arabic.md](/opentype-shaping-arabic.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 4.1 `locl`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-before.png --features=-locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-after.png --features=+locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4
montage arabic-locl-before.png right-arrow.png arabic-locl-after.png -geometry +0+0 -background transparent arabic-locl.png
## 4.2 `isol`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-before.png --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-after.png --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647
montage arabic-isol-before.png right-arrow.png arabic-isol-after.png -geometry +0+0 -background transparent arabic-isol.png
## 4.3 `fina`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,0628
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,0628
montage arabic-fina-before.png right-arrow.png arabic-fina-after.png -geometry +0+0 -background transparent arabic-fina.png
## 4.6 `medi`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,062e,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=25cc,062e,25cc
montage arabic-medi-before.png right-arrow.png arabic-medi-after.png -geometry +0+0 -background transparent arabic-medi.png
## 4.8 `init`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,25cc
montage arabic-init-before.png right-arrow.png arabic-init-after.png -geometry +0+0 -background transparent arabic-init.png
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623
montage arabic-rlig-before.png right-arrow.png arabic-rlig-after.png -geometry +0+0 -background transparent arabic-rlig.png
## 4.10 `rclt`
> None found.
## 4.11 `calt`
> Note: Noto Nastaliq Urdu implements this as a `rlig` lookup for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-before.png --features=-liga,-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-after.png --features=+liga,+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645
montage arabic-calt-before.png right-arrow.png arabic-calt-after.png -geometry +0+0 -background transparent arabic-calt.png
## 5.1 `liga`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-before.png --features=-liga,-fina,-medi,-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoKufiArabic-Regular.ttf --unicodes=0631,064a,0627,0644
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-after.png --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoKufiArabic-Regular.ttf --unicodes=0631,064a,0627,0644
montage arabic-liga-before.png right-arrow.png arabic-liga-after.png -geometry +0+0 -background transparent arabic-liga.png
## 5.3 `cswh`
> None found.
## 5.4 `mset`
> None found. Could be emulated with `mark`, however.
## 7.1 `curs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-before.png --features=-curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-after.png --features=+curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645
montage arabic-curs-before.png right-arrow.png arabic-curs-after.png -geometry +0+0 -background transparent arabic-curs.png
## 7.3 `mark`
hb-view --font-size=110 --margin=2,32,2,32 --output-file=arabic-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653
hb-view --font-size=110 --margin=2,32,2,16 --output-file=arabic-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653
montage arabic-mark-before.png right-arrow.png arabic-mark-after.png -geometry +0+0 -background transparent arabic-mark.png
================================================
FILE: images/arabic/arabic-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-arabic.md](../../opentype-shaping-arabic.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-Regular.ttf --unicodes=2192
## 2 `ccmp`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-ccmp-before.svg --features=-ccmp --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=067e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-ccmp-after.svg --features=+ccmp --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=067e
svg_stack.py --direction=h arabic-ccmp-before.svg right-arrow.svg arabic-ccmp-after.svg > arabic-ccmp.svg
## 4.1 `locl`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-before.svg --features=-locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-locl-after.svg --features=+locl --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=06f4
svg_stack.py --direction=h arabic-locl-before.svg right-arrow.svg arabic-locl-after.svg > arabic-locl.svg
## 4.2 `isol`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-before.svg --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-isol-after.svg --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0647
svg_stack.py --direction=h arabic-isol-before.svg right-arrow.svg arabic-isol-after.svg > arabic-isol.svg
## 4.3 `fina`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,200d,0628
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-fina-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,0628
svg_stack.py --direction=h arabic-fina-before.svg right-arrow.svg arabic-fina-after.svg > arabic-fina.svg
## 4.6 `medi`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,200d,062e,200d,0637
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-medi-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0626,062e,0637
svg_stack.py --direction=h arabic-medi-before.svg right-arrow.svg arabic-medi-after.svg > arabic-medi.svg
## 4.8 `init`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,200d,0626
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-init-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=063a,0626
svg_stack.py --direction=h arabic-init-before.svg right-arrow.svg arabic-init-after.svg > arabic-init.svg
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0623
svg_stack.py --direction=h arabic-rlig-before.svg right-arrow.svg arabic-rlig-after.svg > arabic-rlig.svg
## 4.10 `rclt`
> None found.
## 4.11 `calt`
> Note: Noto Nastaliq Urdu implements this as a `rlig` lookup for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-before.svg --features=-liga,-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-calt-after.svg --features=+liga,+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=062d,0645
svg_stack.py --direction=h arabic-calt-before.svg right-arrow.svg arabic-calt-after.svg > arabic-calt.svg
## 5.1 `liga`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-before.svg --features=-liga,-fina,-medi,-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0627,0644,0644,0651,0670,06c1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-liga-after.svg --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0627,0644,0644,0651,0670,06c1
svg_stack.py --direction=h arabic-liga-before.svg right-arrow.svg arabic-liga-after.svg > arabic-liga.svg
## 5.2 `dlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dlig-before.svg --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0644,0647
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dlig-after.svg --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNaskhArabic-Regular.ttf --unicodes=0644,0644,0647
svg_stack.py --direction=h arabic-dlig-before.svg right-arrow.svg arabic-dlig-after.svg > arabic-dlig.svg
## 5.3 `cswh`
> None found.
## 5.4 `mset`
> None found. Could be emulated with `mark`, however.
## 7.1 `curs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-before.svg --features=-curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-curs-after.svg --features=+curs --language=urd --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0642,0633,0645
svg_stack.py --direction=h arabic-curs-before.svg right-arrow.svg arabic-curs-after.svg > arabic-curs.svg
## 7.2 `dist` (not yet added to document)
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0628,062f,066e,062d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0628,062f,066e,062d
svg_stack.py --direction=h arabic-dist-before.svg right-arrow.svg arabic-dist-after.svg > arabic-dist.svg
## 7.2 `kern`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0635,062f,0627
hb-view --font-size=110 --margin=2,16,2,16 --output-file=arabic-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0635,062f,0627
svg_stack.py --direction=h arabic-kern-before.svg right-arrow.svg arabic-kern-after.svg > arabic-kern.svg
## 7.3 `mark`
hb-view --font-size=110 --margin=2,32,2,32 --output-file=arabic-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653
hb-view --font-size=110 --margin=2,32,2,16 --output-file=arabic-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoNastaliqUrdu-Regular.ttf --unicodes=0643,0653
svg_stack.py --direction=h arabic-mark-before.svg right-arrow.svg arabic-mark-after.svg > arabic-mark.svg
================================================
FILE: images/bengali/bengali-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-bengali.md](../../opentype-shaping-bengali.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09cc
hb-view --font-size=110 --margin=2,16,2,16
--output-file=bengali-matra-decompose-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c7,200c,25cc,09d7
montage bengali-matra-decompose-before.png right-arrow.png bengali-matra-decompose-after.png -geometry +0+0 -background transparent bengali-matra-decompose.png
## 2.7 Post-base consonants
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-before.png --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-after.png --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af
montage bengali-yaphala-before.png right-arrow.png bengali-yaphala-after.png -geometry +0+0 -background transparent bengali-yaphala.png
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,25cc,09bc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,09bc
montage bengali-nukt-before.png right-arrow.png bengali-nukt-after.png -geometry +0+0 -background transparent bengali-nukt.png
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b7
montage bengali-akhn-kssa-before.png right-arrow.png bengali-akhn-kssa-after.png -geometry +0+0 -background transparent bengali-akhn-kssa.png
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,25cc,09cd,099e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,09cd,099e
montage bengali-akhn-jnya-before.png right-arrow.png bengali-akhn-jnya-after.png -geometry +0+0 -background transparent bengali-akhn-jnya.png
## 3.4 `rphf`
### Bengali
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,25cc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc
montage bengali-rphf-before.png right-arrow.png bengali-rphf-after.png -geometry +0+0 -background transparent bengali-rphf.png
### Assamese
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,25cc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc
montage bengali-rphf-as-before.png right-arrow.png bengali-rphf-as-after.png -geometry +0+0 -background transparent bengali-rphf-as.png
## 3.7 `blwf`
### Raphala
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0
montage bengali-raphala-before.png right-arrow.png bengali-raphala-after.png -geometry +0+0 -background transparent bengali-raphala.png
### Baphala
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac
montage bengali-baphala-before.png right-arrow.png bengali-baphala-after.png -geometry +0+0 -background transparent bengali-baphala.png
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,0998
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,0998
montage bengali-half-ka-before.png right-arrow.png bengali-half-ka-after.png -geometry +0+0 -background transparent bengali-half-ka.png
## 3.10 `pstf`
> Same as 2.7
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b0
montage bengali-vatu-before.png right-arrow.png bengali-vatu-after.png -geometry +0+0 -background transparent bengali-vatu.png
## 3.12 `cjct`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-before.png --features=-init,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,25cc,09cd,09a4,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-after.png --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,09cd,09a4,25cc
montage bengali-cjct-before.png right-arrow.png bengali-cjct-after.png -geometry +0+0 -background transparent bengali-cjct.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c8,09b8,09cd,09ae,099a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b8,09cd,09ae,099a,09c8
montage bengali-matra-position-before.png right-arrow.png bengali-matra-position-after.png -geometry +0+0 -background transparent bengali-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-before.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc,09a1,09cd,09a1,09cd,0996,09c1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,09a1,09cd,09a1,09cd,0996,09c1
montage bengali-reph-position-before.png right-arrow.png bengali-reph-position-after.png -geometry +0+0 -background transparent bengali-reph-position.png
## 5 `init`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7
montage bengali-init-before.png right-arrow.png bengali-init-after.png -geometry +0+0 -background transparent bengali-init.png
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf
montage bengali-pres-before.png right-arrow.png bengali-pres-after.png -geometry +0+0 -background transparent bengali-pres.png
## 5 `abvs`
> Note that Noto Bengali implements this feature in a pres lookup for
unknown reasons.
hb-view --font-size=110 --margin=2,25,2,16 --output-file=bengali-abvs-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvs-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981
montage bengali-abvs-before.png right-arrow.png bengali-abvs-after.png -geometry +0+0 -background transparent bengali-abvs.png
# 5 `blws`
> Note that Noto Bengali implements this feature in a pres lookup for
unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac
montage bengali-blws-before.png right-arrow.png bengali-blws-after.png -geometry +0+0 -background transparent bengali-blws.png
## 5 `psts`
> Note that Noto Bengali implements this feature in a pres lookup for
unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0
montage bengali-psts-before.png right-arrow.png bengali-psts-after.png -geometry +0+0 -background transparent bengali-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd
montage bengali-haln-before.png right-arrow.png bengali-haln-after.png -geometry +0+0 -background transparent bengali-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981
montage bengali-abvm-before.png right-arrow.png bengali-abvm-after.png -geometry +0+0 -background transparent bengali-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2
montage bengali-blwm-before.png right-arrow.png bengali-blwm-after.png -geometry +0+0 -background transparent bengali-blwm.png
================================================
FILE: images/bengali/bengali-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-bengali.md](../../opentype-shaping-bengali.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-Regular.ttf --unicodes=2192
cluster_styles = None
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-decompose-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c7,200c,25cc,09d7
svg_stack.py --direction=h bengali-matra-decompose-before.svg right-arrow.svg bengali-matra-decompose-after.svg > bengali-matra-decompose.svg
cluster_styles = [c0,dc,c0,arrow,c0,dc,dc,c1]
## 2.7 Post-base consonants
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-before.svg --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-yaphala-after.svg --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09af
svg_stack.py --direction=h bengali-yaphala-before.svg right-arrow.svg bengali-yaphala-after.svg > bengali-yaphala.svg
cluster_styles = [dc,c0,c1,arrow,dc,c1]
#### Duplicates for other subsections
cp bengali-yaphala.svg bengali-yaphala-1.svg
cluster_styles = [dc,c0,c1,arrow,dc,c1]
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,25cc,09bc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a1,09bc
svg_stack.py --direction=h bengali-nukt-before.svg right-arrow.svg bengali-nukt-after.svg > bengali-nukt.svg
cluster_styles = [c0,dc,c1,arrow,c0]
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b7
svg_stack.py --direction=h bengali-akhn-kssa-before.svg right-arrow.svg bengali-akhn-kssa-after.svg > bengali-akhn-kssa.svg
cluster_styles = [c0,dc,c1,c2,arrow,c0]
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,25cc,09cd,099e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099c,09cd,099e
svg_stack.py --direction=h bengali-akhn-jnya-before.svg right-arrow.svg bengali-akhn-jnya-after.svg > bengali-akhn-jnya.svg
cluster_styles = [c0,dc,c1,c2,arrow,c0]
## 3.4 `rphf`
### Bengali
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,25cc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc
svg_stack.py --direction=h bengali-rphf-before.svg right-arrow.svg bengali-rphf-after.svg > bengali-rphf.svg
cluster_styles = [c0,dc,c1,arrow,dc,c0]
### Assamese
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,25cc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-rphf-as-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc
svg_stack.py --direction=h bengali-rphf-as-before.svg right-arrow.svg bengali-rphf-as-after.svg > bengali-rphf-as.svg
cluster_styles = [c0,dc,c1,arrow,dc,c0]
## 3.7 `blwf`
### Raphala
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-raphala-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09b0
svg_stack.py --direction=h bengali-raphala-before.svg right-arrow.svg bengali-raphala-after.svg > bengali-raphala.svg
cluster_styles = [dc,c0,c1,arrow,dc,c0]
#### Duplicates for other subsections
cp bengali-raphala.svg bengali-raphala-1.svg
cluster_styles = [dc,c0,c1,arrow,dc,c0]
cp bengali-raphala.svg bengali-raphala-2.svg
cluster_styles = [dc,c0,c1,arrow,dc,c0]
### Baphala
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-baphala-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=25cc,09cd,09ac
svg_stack.py --direction=h bengali-baphala-before.svg right-arrow.svg bengali-baphala-after.svg > bengali-baphala.svg
cluster_styles = [dc,c0,c1,arrow,dc,c0]
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,0998
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-half-ka-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,0998
svg_stack.py --direction=h bengali-half-ka-before.svg right-arrow.svg bengali-half-ka-after.svg > bengali-half-ka.svg
cluster_styles = [c0,dc,c1,c2,arrow,c0,c2]
## 3.10 `pstf`
> Same as 2.7
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,25cc,09cd,09b0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0995,09cd,09b0
svg_stack.py --direction=h bengali-vatu-before.svg right-arrow.svg bengali-vatu-after.svg > bengali-vatu.svg
cluster_styles = [c0,dc,c1,c2,arrow,c0]
## 3.12 `cjct`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-before.svg --features=-init,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,25cc,09cd,09a4,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-cjct-after.svg --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09aa,09cd,09a4,25cc
svg_stack.py --direction=h bengali-cjct-before.svg right-arrow.svg bengali-cjct-after.svg > bengali-cjct.svg
cluster_styles = [c0,dc,c1,c2,dc,arrow,c0,dc]
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09c8,09b8,09cd,09ae,099a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-matra-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b8,09cd,09ae,099a,09c8
svg_stack.py --direction=h bengali-matra-position-before.svg right-arrow.svg bengali-matra-position-after.svg > bengali-matra-position.svg
cluster_styles = [c0,dc,c1,c2,arrow,c1,c0,c2]
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-before.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,25cc,09a1,09cd,09a1,09cd,0996,09c1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-reph-position-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09f0,09cd,09a1,09cd,09a1,09cd,0996,09c1
svg_stack.py --direction=h bengali-reph-position-before.svg right-arrow.svg bengali-reph-position-after.svg > bengali-reph-position.svg
cluster_styles = [c0,c1,dc,c2,c3,c4,arrow,c0,c1,c2,c3]
## 5 `init`
> ?? Maybe there's a headline-using second character that would be
> better here....
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0999,09c7
svg_stack.py --direction=h bengali-init-before.svg right-arrow.svg bengali-init-after.svg > bengali-init.svg
cluster_styles = [c0,c1,arrow,c0,c1]
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099f,09bf
svg_stack.py --direction=h bengali-pres-before.svg right-arrow.svg bengali-pres-after.svg > bengali-pres.svg
cluster_styles = [c0,c1,arrow,c0]
## 5 `abvs`
> Note that Noto Bengali implements this feature in a pres lookup for
> unknown reasons.
> No more!
hb-view --font-size=110 --margin=2,25,2,16 --output-file=bengali-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09b0,09cd,25cc,09c0,0981
svg_stack.py --direction=h bengali-abvs-before.svg right-arrow.svg bengali-abvs-after.svg > bengali-abvs.svg
cluster_styles = [c0,c1,dc,c2,c3,arrow,c0,c1,dc,c2,c3]
# 5 `blws`
> Note that Noto Bengali implements this feature in a pres lookup for
> unknown reasons.
> This now seems to require disablng -cjct and -blws, but -pres is no
> longer involved.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-before.svg --features=-init,-blws,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blws-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099d,09cd,09ac
svg_stack.py --direction=h bengali-blws-before.svg right-arrow.svg bengali-blws-after.svg > bengali-blws.svg
cluster_styles = [c0,c1,arrow,c0]
## 5 `psts`
> Note that Noto Bengali implements this feature in a pres lookup for
> unknown reasons.
> No more!
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-before.svg --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-psts-after.svg --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09a0,09c0
svg_stack.py --direction=h bengali-psts-before.svg right-arrow.svg bengali-psts-after.svg > bengali-psts.svg
cluster_styles = [c0,c1,arrow,c0,c1]
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-before.svg --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=099b,09bc,09cd
svg_stack.py --direction=h bengali-haln-before.svg right-arrow.svg bengali-haln-after.svg > bengali-haln.svg
cluster_styles = [c0,c1,c2,arrow,c0,c1,c2]
## 6 `abvm`
> ????
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=0994,0981
svg_stack.py --direction=h bengali-abvm-before.svg right-arrow.svg bengali-abvm-after.svg > bengali-abvm.svg
cluster_styles = [c0,c1,arrow,c0,c1]
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=bengali-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifBengali-Regular.ttf --unicodes=09ad,09c2
svg_stack.py --direction=h bengali-blwm-before.svg right-arrow.svg bengali-blwm-after.svg > bengali-blwm.svg
cluster_styles = [c0,c1,arrow,c0,c1]
================================================
FILE: images/devanagari/devanagari-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-devanagari.md](../../opentype-shaping-devanagari.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 3.1 `locl`
> Note: Noto Devanagari has a 'MAR' locl feature.
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,25cc,093c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c
montage devanagari-nukt-before.png right-arrow.png devanagari-nukt-after.png -geometry +0+0 -background transparent devanagari-nukt.png
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,25cc,094d,0937
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,094d,0937
montage devanagari-akhn-kssa-before.png right-arrow.png devanagari-akhn-kssa-after.png -geometry +0+0 -background transparent devanagari-akhn-kssa.png
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,25cc,094d,091e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,094d,091e
montage devanagari-akhn-jnya-before.png right-arrow.png devanagari-akhn-jnya-after.png -geometry +0+0 -background transparent devanagari-akhn-jnya.png
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,25cc,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc
montage devanagari-rphf-before.png right-arrow.png devanagari-rphf-after.png -geometry +0+0 -background transparent devanagari-rphf.png
## 3.5 `rkrf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-before.png --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-after.png --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,094d,0930
montage devanagari-rkrf-before.png right-arrow.png devanagari-rkrf-after.png -geometry +0+0 -background transparent devanagari-rkrf.png
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930
montage devanagari-blwf-before.png right-arrow.png devanagari-blwf-after.png -geometry +0+0 -background transparent devanagari-blwf.png
## 3.9 `half`
### Half form
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,25cc,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,094d
montage devanagari-half-before.png right-arrow.png devanagari-half-after.png -geometry +0+0 -background transparent devanagari-half.png
### Eyelash Ra
> Note that Noto Devanagari eyelash-Ra substitution does not appear to
> work when using `U+25cc` dotted circle as the "base consonant"
> substitute. Hence, a real consonant glyph is used instead. But it is
> important that "Ra" _not_ be used as the "base consonant", as this
> triggers "Rakaar".
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932
montage devanagari-eyelash-ra-before.png right-arrow.png devanagari-eyelash-ra-after.png -geometry +0+0 -background transparent devanagari-eyelash-ra.png
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,094d,0930
montage devanagari-vatu-before.png right-arrow.png devanagari-vatu-after.png -geometry +0+0 -background transparent devanagari-vatu.png
## 3.12 `cjct`
> Note: Noto Serif Devanagari implements this as `pres` for unknown
> reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,25cc,094d,0922
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,094d,0922
montage devanagari-cjct-before.png right-arrow.png devanagari-cjct-after.png -geometry +0+0 -background transparent devanagari-cjct.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=093f,091e,094d,200c,091e,094d,0939,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091e,094d,200c,091e,094d,0939,094d,0930,093f
montage devanagari-matra-position-before.png right-arrow.png devanagari-matra-position-after.png -geometry +0+0 -background transparent devanagari-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,092f,094d,0932,094d,092e,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,092f,094d,0932,094d,092e,094d,0930
montage devanagari-reph-position-before.png right-arrow.png devanagari-reph-position-after.png -geometry +0+0 -background transparent devanagari-reph-position.png
## 5 `init`
> Note: Noto Devanagari and Murty don't implement `init`.
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f
montage devanagari-pres-before.png right-arrow.png devanagari-pres-after.png -geometry +0+0 -background transparent devanagari-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949
montage devanagari-abvs-before.png right-arrow.png devanagari-abvs-after.png -geometry +0+0 -background transparent devanagari-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944
montage devanagari-blws-before.png right-arrow.png devanagari-blws-after.png -geometry +0+0 -background transparent devanagari-blws.png
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-before.png --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-after.png --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940
montage devanagari-psts-before.png right-arrow.png devanagari-psts-after.png -geometry +0+0 -background transparent devanagari-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,095c,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,095c,094d
montage devanagari-haln-before.png right-arrow.png devanagari-haln-after.png -geometry +0+0 -background transparent devanagari-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948
montage devanagari-abvm-before.png right-arrow.png devanagari-abvm-after.png -geometry +0+0 -background transparent devanagari-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943
montage devanagari-blwm-before.png right-arrow.png devanagari-blwm-after.png -geometry +0+0 -background transparent devanagari-blwm.png
================================================
FILE: images/devanagari/devanagari-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-devanagari.md](../../opentype-shaping-devanagari.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 3.1 `locl`
> Note: Noto Devanagari has 'NEP' and 'MAR' locl features.
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,25cc,093c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c
svg_stack --direction=h devanagari-nukt-before.svg right-arrow.svg devanagari-nukt-after.svg > devanagari-nukt.svg
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,25cc,094d,0937
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,094d,0937
svg_stack --direction=h devanagari-akhn-kssa-before.svg right-arrow.svg devanagari-akhn-kssa-after.svg > devanagari-akhn-kssa.svg
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,25cc,094d,091e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091c,094d,091e
svg_stack --direction=h devanagari-akhn-jnya-before.svg right-arrow.svg devanagari-akhn-jnya-after.svg > devanagari-akhn-jnya.svg
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,25cc,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc
svg_stack --direction=h devanagari-rphf-before.svg right-arrow.svg devanagari-rphf-after.svg > devanagari-rphf.svg
## 3.5 `rkrf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-before.svg --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-rkrf-after.svg --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091d,094d,0930
svg_stack --direction=h devanagari-rkrf-before.svg right-arrow.svg devanagari-rkrf-after.svg > devanagari-rkrf.svg
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwf-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=25cc,094d,0930
svg_stack --direction=h devanagari-blwf-before.svg right-arrow.svg devanagari-blwf-after.svg > devanagari-blwf.svg
## 3.9 `half`
### Half form
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,25cc,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-half-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0932,094d,0930,094d
svg_stack --direction=h devanagari-half-before.svg right-arrow.svg devanagari-half-after.svg > devanagari-half.svg
### Eyelash Ra
> Note that Noto Devanagari eyelash-Ra substitution does not appear to
> work when using `U+25cc` dotted circle as the "base consonant"
> substitute. Hence, a real consonant glyph is used instead. But it is
> important that "Ra" _not_ be used as the "base consonant", as this
> triggers "Rakaar".
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-eyelash-ra-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0931,094d,0932
svg_stack --direction=h devanagari-eyelash-ra-before.svg right-arrow.svg devanagari-eyelash-ra-after.svg > devanagari-eyelash-ra.svg
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,25cc,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0936,094d,0930
svg_stack --direction=h devanagari-vatu-before.svg right-arrow.svg devanagari-vatu-after.svg > devanagari-vatu.svg
## 3.12 `cjct`
> Note: Noto Serif Devanagari implements this as `pres` for unknown
> reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,25cc,094d,0922
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-cjct-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0922,094d,0922
svg_stack --direction=h devanagari-cjct-before.svg right-arrow.svg devanagari-cjct-after.svg > devanagari-cjct.svg
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=093f,091e,094d,200c,091e,094d,0939,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-matra-position-after.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=091e,094d,200c,091e,094d,0939,094d,0930,093f
svg_stack --direction=h devanagari-matra-position-before.svg right-arrow.svg devanagari-matra-position-after.svg > devanagari-matra-position.svg
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,092f,094d,0932,094d,092e,094d,0930
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-reph-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,092f,094d,0932,094d,092e,094d,0930
svg_stack --direction=h devanagari-reph-position-before.svg right-arrow.svg devanagari-reph-position-after.svg > devanagari-reph-position.svg
## 5 `init`
> Note: Noto Devanagari and Murty don't implement `init`.
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0916,093f
svg_stack --direction=h devanagari-pres-before.svg right-arrow.svg devanagari-pres-after.svg > devanagari-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0930,094d,25cc,0949
svg_stack --direction=h devanagari-abvs-before.svg right-arrow.svg devanagari-abvs-after.svg > devanagari-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0939,0944
svg_stack --direction=h devanagari-blws-before.svg right-arrow.svg devanagari-blws-after.svg > devanagari-blws.svg
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-before.svg --features=-init,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-psts-after.svg --features=-init,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,093c,0940
svg_stack --direction=h devanagari-psts-before.svg right-arrow.svg devanagari-psts-after.svg > devanagari-psts.svg
## 5 `haln`
# look at 0926,093c,094d in serif???
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-before.svg --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansDevanagari-Regular.ttf --unicodes=25cc,095d,094d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansDevanagari-Regular.ttf --unicodes=25cc,095d,094d
svg_stack --direction=h devanagari-haln-before.svg right-arrow.svg devanagari-haln-after.svg > devanagari-haln.svg
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=092b,0948
svg_stack --direction=h devanagari-abvm-before.svg right-arrow.svg devanagari-abvm-after.svg > devanagari-abvm.svg
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943
hb-view --font-size=110 --margin=2,16,2,16 --output-file=devanagari-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifDevanagari-Regular.ttf --unicodes=0915,0943
svg_stack --direction=h devanagari-blwm-before.svg right-arrow.svg devanagari-blwm-after.svg > devanagari-blwm.svg
================================================
FILE: images/emoji/emoji-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-emoji.md](../../opentype-shaping-emoji.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Invisibles general
> Two options for VS15 and VS16 are included.
>
> Adobe Source Emoji includes non-color glyphs for both codepoints that
> offer a degree of visual communication to prevent confusion in the
> sequence illustrations, but they are not immediately associated with
> the codepoint itself, like Gentium Plus's are.
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=text-pres-selector.png --font-funcs=ot --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=fe0e
hb-view --font-size=110 --margin=2,16,2,16 --features="ss06" --shapers=ot --preserve-default-ignorables --output-file=vs15.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=fe0e
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=emoji-pres-selector.png --font-funcs=ot --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=fe0f
hb-view --font-size=110 --margin=2,16,2,16 --features="ss06" --shapers=ot --preserve-default-ignorables --output-file=vs16.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=fe0f
> Gentium Plus's ZWJ and ZWNJ are visually preferrable:
hb-view --font-size=110 --margin=2,16,2,16 --features="ss06" --shapers=ot --preserve-default-ignorables --output-file=zwj.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=200d
hb-view --font-size=110 --margin=2,16,2,16 --features="ss06" --shapers=ot --preserve-default-ignorables --output-file=zwnj.png --background=FFFFFF00 /usr/share/fonts/truetype/gentiumplus/GentiumPlus-R.ttf --unicodes=200c
## Human beings general
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-boy.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f466
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-girl.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f467
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-man.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-woman.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fallback-generalperson.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9d1
## Presentation sequences
> Codepoints used are based on defaults in https://www.unicode.org/emoji/charts-14.0/text-style.html
> Invisibles:
montage vs15.png text-pres-selector.png -geometry +0+0 -background transparent -tile 2x1 text-presentation.png
montage vs16.png emoji-pres-selector.png -geometry +0+0 -background transparent -tile 2x1 emoji-presentation.png
> Default text-presentation: U+26A0, warning arrow:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-text-before.png --background=FFFFFF00 ./Noto\ Emoji\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=26a0
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-text-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=26a0
montage default-text-before.png emoji-pres-selector.png right-arrow.png default-text-after.png -geometry +0+0 -background transparent -tile 4x1 emoji-pres-sequence.png
> Default emoji-presentation: U+231B, hourglass:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-emoji-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=231b
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=default-emoji-after.png --background=FFFFFF00 ./Noto\ Emoji\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=231b
montage default-emoji-before.png text-pres-selector.png right-arrow.png default-emoji-after.png -geometry +0+0 -background transparent -tile 4x1 text-pres-sequence.png
## Modifier sequences
> Adobe Source Emoji for invisibles:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-2.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fb
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-3.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fc
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-4.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fd
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-5.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3fe
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-6.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f3ff
> NotoColorEmoji squares for fallback skin-tone-squares:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-2.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fb
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-3.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fc
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-4.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fd
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-5.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3fe
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=skintone-fallback-6.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3ff
> Symbola for text-mode fallback squares:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-2-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fb
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-3-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fc
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-4-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fd
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-5-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3fe
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=skintone-6-text-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/ancient-scripts/Symbola_hint.ttf --unicodes=1f3ff
> Fitzpatrick scale:
montage skintone-2.png skintone-fallback-2.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-2.png
montage skintone-3.png skintone-fallback-3.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-3.png
montage skintone-4.png skintone-fallback-4.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-4.png
montage skintone-5.png skintone-fallback-5.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-5.png
montage skintone-6.png skintone-fallback-6.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-6.png
> Fitzpatrick scale text fallback:
> (currently unused)
montage skintone-2.png skintone-2-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-2-text-fallback.png
montage skintone-3.png skintone-3-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-3-text-fallback.png
montage skintone-4.png skintone-4-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-4-text-fallback.png
ontage skintone-5.png skintone-5-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-5-text-fallback.png
montage skintone-6.png skintone-6-text-fallback.png -geometry +0+0 -background transparent -tile 2x1 fitzpatrick-6-text-fallback.png
> Sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=modifier-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f44b
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=modifier-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f44b,1f3fd
montage modifier-before.png skintone-4.png right-arrow.png modifier-after.png -geometry +0+0 -background transparent -tile 4x1 modifier-sequence.png
montage modifier-before.png skintone-4.png right-arrow.png modifier-before.png skintone-fallback-4.png -geometry +0+0 -background transparent -tile 5x1 modifier-sequence-fallback.png
## Regional Indicator flag sequences
> UN chosen for maximum acheivable internationality
> Sequence
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --output-file=regional-flag-un-before.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=1f1fa,1f1f3
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=regional-flag-un-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f1fa,1f1f3
montage regional-flag-un-before.png right-arrow.png regional-flag-un-after.png -geometry +0+0 -background transparent regional-indicator-flag-sequence-un.png
> Fallbacks
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=regional-flag-un-fallback.png --background=FFFFFF00 ./Noto\ Emoji\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=1f1fa,1f1f3
montage regional-flag-un-before.png right-arrow.png regional-flag-un-fallback.png -geometry +0+0 -background transparent regional-indicator-flag-sequence-un-fallback.png
## Tag flag sequences
> From https://unicode.org/reports/tr51/#valid-emoji-tag-sequences
>
> Wales chosen to be most distinctive example KNOWN to be widely implemented
> Tag pseudo-glyphs
> LastResort. This crops in, non-precisely, on the "tag" symbol itself:
hb-view --font-size=110 --margin=-42,-28,-44,-28 --preserve-default-ignorables --output-file=tag-isolate.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f
> Margin-adjusted versions:
hb-view --font-size=110 --margin=-20,-28,-48,-28 --preserve-default-ignorables --output-file=tag-isolate-high.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f
hb-view --font-size=110 --margin=-44,-28,-24,-28 --preserve-default-ignorables --output-file=tag-isolate-low.png --background=FFFFFF00 ./unicode/Last-Resort/LastResort-Regular.ttf --unicodes=e007f
> DejaVu empty dotted square, U+2B1A:
hb-view --font-size=110 --margin=-16,2,2,2 --preserve-default-ignorables --output-file=dotted-square.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=2b1a
> Letters and "end" components:
hb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=g-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0067
hb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=b-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0062
hb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=w-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0077
hb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=l-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=006c
hb-view --font-size=40 --margin=16,2,2,2 --preserve-default-ignorables --output-file=s-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0073
hb-view --font-size=32 --margin=2,2,16,2 --preserve-default-ignorables --output-file=end-isolate.png --background=FFFFFF00 /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf --unicodes=0045,004e,0044
> Composite tags:
composite -gravity north tag-isolate-high.png dotted-square.png blank-tag-high.png
composite -gravity south tag-isolate-low.png dotted-square.png blank-tag-low.png
composite -gravity north g-isolate.png blank-tag-low.png tag-g.png
composite -gravity north b-isolate.png blank-tag-low.png tag-b.png
composite -gravity north w-isolate.png blank-tag-low.png tag-w.png
composite -gravity north l-isolate.png blank-tag-low.png tag-l.png
composite -gravity north s-isolate.png blank-tag-low.png tag-s.png
composite -gravity south end-isolate.png blank-tag-high.png tag-end.png
> Completed sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=tag-flag-black.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3f4
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=tag-flag-wales-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3f4,e0067,e0062,e0077,e006c,e0073,e007f
montage tag-flag-black.png tag-g.png tag-b.png tag-w.png tag-l.png tag-s.png tag-end.png -geometry +0+0 -background transparent -tile 7x1 tag-flag-wales-before.png
montage tag-flag-wales-before.png right-arrow.png tag-flag-wales-after.png -geometry +0+0 -background transparent tag-flag-sequence-wales.png
## Keycap sequences
> Noto Sans Symbols has a visual CEK:
hb-view --font-size=110 --margin=2,64,2,64 --preserve-default-ignorables --output-file=keycap-cek.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSymbols-Regular.ttf --unicodes=20e3
> Sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=keycap-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerif-Regular.ttf --unicodes=0034
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=keycap-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=0034,20e3
> Something off here; the -gravity and -geometry switches are *not* working as expected....
montage keycap-before.png keycap-cek.png right-arrow.png keycap-after.png -geometry +0+0 -background transparent -tile 4x1 keycap-sequence.png
## ZWJ sequences
### ZWJ hair sequences
> Hairstyles:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-red.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b0
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-curly.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b1
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-bald.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b2
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hair-white.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f9b3
> Sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=woman-white-hair.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F469,200d,1f9b3
montage fallback-woman.png zwj.png hair-white.png right-arrow.png woman-white-hair.png -geometry +0+0 -background transparent -tile 5x1 hairstyle-sequence.png
### ZWJ gendered person sequences
> Gender signs, input (text-presentation style):
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-female.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2640
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-male.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2642
> Gender signs, output fallback (emoji-presentation style):
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-female-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2640
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendersign-male-fallback.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2642
> Sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendered-person-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F3c4
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=gendered-person-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F3c4,200d,2640,fe0f
montage gendered-person-before.png zwj.png gendersign-female.png emoji-pres-selector.png right-arrow.png gendered-person-after.png -geometry +0+0 -background transparent -tile 6x1 gendered-person-sequence.png
### ZWJ multi-person group sequences
>
> Couple with heart:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=heart.png --background=FFFFFF00 ./AdobeSourceEmoji/SourceEmoji-BnW.otf --unicodes=2764
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-man-heart-man-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,2764,fe0f,200d,1f468
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-man-skintone-2-heart-man-skintone-2-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,1f3fb,200d,2764,fe0f,200d,1f468,1f3fb
> Sequence:
montage fallback-man.png zwj.png heart.png emoji-pres-selector.png zwj.png fallback-man.png right-arrow.png multi-person-man-heart-man-after.png -geometry +0+0 -background transparent -tile 8x1 multi-person-heart-sequence.png
montage fallback-man.png skintone-2.png zwj.png heart.png emoji-pres-selector.png zwj.png fallback-man.png skintone-2.png right-arrow.png multi-person-man-skintone-2-heart-man-skintone-2-after.png -geometry +0+0 -background transparent -tile 10x1 multi-person-heart-skintone-sequence.png
>
> Couple kiss:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=kiss-mark.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f48b
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-kiss-sequence-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,2764,fe0f,200d,1f48b,200d,1f468
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-kiss-sequence-skintone-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3ff,200d,2764,fe0f,200d,1f48b,200d,1f468,1f3fd
> Sequence:
montage fallback-woman.png zwj.png heart.png emoji-pres-selector.png zwj.png kiss-mark.png zwj.png fallback-man.png right-arrow.png multi-person-kiss-sequence-after.png -geometry +0+0 -background transparent -tile 10x1 multi-person-kiss-sequence.png
montage fallback-woman.png skintone-6.png zwj.png heart.png emoji-pres-selector.png zwj.png kiss-mark.png zwj.png fallback-man.png skintone-4.png right-arrow.png multi-person-kiss-sequence-skintone-after.png -geometry +0+0 -background transparent -tile 12x1 multi-person-kiss-skintone-sequence.png
>
> Couple holding hands:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=handshake.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f91d
> Noto Color Emoji does not support this sequence, perhaps because it expects
> fallback to `U+1F46D` to occur, e.g. at the keyboard/UI level....
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-holding-hands-sequence-woman-woman-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f46d
> This modifier sequence IS supported in Noto Color Emoji
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-holding-hands-sequence-woman-woman-skintone-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3fc,200d,1f91d,200d,1f469,1f3fe
> Sequence:
montage fallback-woman.png zwj.png handshake.png zwj.png fallback-woman.png right-arrow.png multi-person-holding-hands-sequence-woman-woman-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-holding-hands-sequence.png
montage fallback-woman.png skintone-3.png zwj.png handshake.png zwj.png fallback-woman.png skintone-5.png right-arrow.png multi-person-holding-hands-sequence-woman-woman-skintone-after.png -geometry +0+0 -background transparent -tile 9x1 multi-person-holding-hands-skintone-sequence.png
>
> Family:
>
> Noto Color Emoji supports "Man,ZWJ,Woman,ZWJ,_child_" but not "Woman,ZWJ,Man,ZWJ,_child_".
>
> Noto Color Emoji supports "Woman,ZWJ,Woman,ZWJ,Girl,ZWJ,Boy" but not "Woman,ZWJ,Woman,ZWJ,Boy,ZWJ,Girl".
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-boy-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f466
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-girl-girl-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f467,200d,1f467
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-man-woman-girl-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1f469,200d,1f467
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-family-woman-woman-girl-boy-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,1f469,200d,1f467,200d,1f466
> Sequence:
montage fallback-man.png zwj.png fallback-boy.png right-arrow.png multi-person-family-man-boy-after.png -geometry +0+0 -background transparent -tile 5x1 multi-person-family-man-boy-sequence.png
montage fallback-man.png zwj.png fallback-girl.png zwj.png fallback-girl.png right-arrow.png multi-person-family-man-girl-girl-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-family-man-girl-girl-sequence.png
montage fallback-man.png zwj.png fallback-woman.png zwj.png fallback-girl.png right-arrow.png multi-person-family-man-woman-girl-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-family-man-woman-girl-sequence.png
montage fallback-woman.png zwj.png fallback-woman.png zwj.png fallback-girl.png zwj.png fallback-boy.png right-arrow.png multi-person-family-woman-woman-girl-boy-after.png -geometry +0+0 -background transparent -tile 9x1 multi-person-family-woman-woman-girl-boy-sequence.png
>
> Noto Color Emoji does not seem to support skin-tone modifiers for family sequences,
> at least in the current release on my system.
>
>
>
> Shaking hands:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hand-left.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf1
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=hand-right.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf2
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=multi-person-shaking-hands-after.png --background=FFFFFF00 ./blobmoji/Blobmoji.ttf --unicodes=1faf1,1f3fd,200d,1faf2,1f3ff
> Sequence:
montage hand-left.png skintone-4.png zwj.png hand-right.png skintone-6.png right-arrow.png multi-person-handshake-after.png -geometry +0+0 -background transparent -tile 7x1 multi-person-shaking-hands-sequence.png
### ZWJ role sequences
> Firefighter (emoji-pres-by-default):
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=firetruck.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F692
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-firefighter-man-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,200d,1F692
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-firefighter-man-skintone-6-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f468,1f3ff,200d,1F692
montage fallback-man.png zwj.png firetruck.png right-arrow.png role-firefighter-man-after.png -geometry +0+0 -background transparent -tile 5x1 role-sequence-firefighter.png
montage fallback-man.png skintone-6.png zwj.png firetruck.png right-arrow.png role-firefighter-man-skintone-6-after.png -geometry +0+0 -background transparent -tile 6x1 role-sequence-firefighter-skintone-6.png
> Pilot (non-emoji-pres-by-default)
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=airplane.png --background=FFFFFF00 ./Noto\ Emoji\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=2708,fe0e
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-pilot-woman-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,200d,2708,fe0f
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=role-pilot-woman-skintone-2-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f469,1f3fb,200d,2708,fe0f
montage fallback-woman.png zwj.png airplane.png emoji-pres-selector.png right-arrow.png role-pilot-woman-after.png -geometry +0+0 -background transparent -tile 6x1 role-sequence-pilot.png
montage fallback-woman.png skintone-2.png zwj.png airplane.png emoji-pres-selector.png right-arrow.png role-pilot-woman-skintone-2-after.png -geometry +0+0 -background transparent -tile 7x1 role-sequence-pilot-skintone-2.png
### ZWJ color sequences
> Noto Color Emoji, "black cat" seems to be the only widely-implemented sequence:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F408
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1F408,200d,2b1b
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=color-black.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2b1b
montage color-before.png zwj.png color-black.png right-arrow.png color-after.png -geometry +0+0 -background transparent -tile 5x1 color-sequence.png
### ZWJ directionality sequences
> These are not widely implemented, and possible not implemented at all in open-source fonts.
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=running.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f3c3
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=direction-rightward.png --background=FFFFFF00 ./Noto\ Emoji\ BW-via-GoogleFonts/static/NotoEmoji-Regular.ttf --unicodes=27a1
convert -flop running.png running-rightward.png
montage running.png zwj.png direction-rightward.png emoji-pres-selector.png right-arrow.png running-rightward.png -geometry +0+0 -background transparent -tile 6x1 zwj-directionality-sequence.png
### ZWJ additional sequences
> 13 on the named list. Heart-on-fire is clearly not just an overlay, and not a flag:
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=fire.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=1f525
hb-view --font-size=110 --margin=2,16,2,16 --preserve-default-ignorables --font-funcs=ot --output-file=heart-on-fire-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf --unicodes=2764,fe0f,200d,1f525
montage heart.png emoji-pres-selector.png zwj.png fire.png right-arrow.png heart-on-fire-after.png -geometry +0+0 -background transparent -tile 6x1 zwj-sequence-heart-on-fire.png
## Other sequences and ligatures
> For non-standard ligatures (banana split?)
> Adobe SourceEmoji has a "hand + zwj + heart -> i-love-you-hand" ligature:
>Other possiblities:
> https://blog.emojipedia.org/emoji-flags-explained/ (discusses flag options for vendors that don't require Unicode pre-/formal-approval)
> Noto monochromatic emoji details: https://blog.emojipedia.org/exploring-googles-new-black-and-blobby-emoji-font/
> Are Noto's "Emoji Kitchen" emoji (beginning with magic wand) just ligatures, or all they all "stickers"? https://blog.emojipedia.org/emoji-kitchen-beta-magics-back-the-blobs/
================================================
FILE: images/example-fonts.txt
================================================
#############################################################
# #
# All uncommented lines in this file are URLs which should #
# be downloadable via wget, cURL, or scripts. #
# #
# SHA checksums are for the exact file downloaded at the #
# URL (including archive files), not for the .ttf/.otf/.ttc #
# within. #
# #
# If any CDN links change or URLs break, please open an #
# issue in this project's bug tracker, not the font's. #
# #
#############################################################
##########################
# General
#
# GentiumPlus-R.ttf
# sha256: 5244209b44a5111736379686119cd54042dce18e308a351c366999ac563ca6bb
https://software.sil.org/downloads/r/gentium/GentiumPlus-6.101.zip
##########################
# Arabic-like
#
# Arabic
# NotoNaskhArabic-Regular.ttf
# sha256: c1f3654dd9142073b00289700ce0aa5218c1aa4d5be38a3c5f7f2649bee12c1f
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoNaskhArabic/unhinted/ttf/NotoNaskhArabic-Regular.ttf
# NotoNastaliqUrdu-Regular.ttf
# sha256: beee3156a724adf64178c2dbac86f5394b6a4bb67aea75d4354c817c9e6c27da
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoNastaliqUrdu/hinted/ttf/NotoNastaliqUrdu-Regular.ttf
# NotoKufiArabic-Regular.ttf
# sha256: efb00432829e570a12a5afd43dd4667950ee7cf4dbc9f4c421b2f27b89dee301
https://fonts.google.com/download?family=Noto%20Kufi%20Arabic
# Mongolian
# NotoSansMongolian-Regular.ttf
# sha256: 31a750c5b7e335ebb3d841b2f97baf045178db23fe199eb3040cb3496470daa7
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMongolian/unhinted/ttf/NotoSansMongolian-Regular.ttf
# N'Ko
# NotoSansNKo-Regular.ttf
# sha256: 4e9de46bfa60bf800fe4608c5ec434602c496b924b6e5333dc14515b0883dd86
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansNKo/full/ttf/NotoSansNKo-Regular.ttf
# Syriac
# NotoSansSyriacEastern-Regular.ttf
# sha256: 3310d8ec31633d516b09a676b8fa6c784e2d263aeea52051ca40c60347e2e9cd
https://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacEastern-unhinted.zip
# NotoSansSyriacWestern-Regular.ttf
# sha256: e8c6c6ae30ae6fb414e59112ebd08ab8341733b39b756a918e3c324249cdf5b5
https://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacWestern-unhinted.zip
# NotoSansSyriacEstrangela-Regular.ttf
# sha256: 0cb823a1d55ca97bda55ae1f31b03ef762b1faa3447700142f83c9dc8f7828e4
https://noto-website-2.storage.googleapis.com/pkgs/NotoSansSyriacEstrangela-unhinted.zip
##########################
# Indic
#
# Bengali
# NotoSerifBengali-Regular.ttf
# sha256: 6f046ad71ff7f3ba154e7087a02a1f71fff782e85ab00fdd08cd28ad3e06e24b
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifBengali/full/ttf/NotoSerifBengali-Regular.ttf
# Devanagari
# NotoSerifDevanagari-Regular.ttf
# sha256: 55d2e062fac9208412a15e79a5bd3753af650a4ed47bb2eea4b105591fba4a8f
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifDevanagari/full/ttf/NotoSerifDevanagari-Regular.ttf
# Gujarati
# NotoSerifGujarati-Regular.ttf
# sha256: 041827ca7d1b58393587d5c974db1adf9409293ac13f23cef1a02b8700a4c6c3
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifGujarati/full/ttf/NotoSerifGujarati-Regular.ttf
# Gurmukhi
# NotoSansGurmukhi-Regular.ttf
# sha256: 255f404f61622ef03385a2851c2423de3a676d978115b2e39ffd5570e0022c32
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifGurmukhi/full/ttf/NotoSerifGurmukhi-Regular.ttf
# Kannada
# NotoSerifKannada-Regular.ttf
# sha256: c9e7d3168a0134f68456607521b66e3208586d07cc144618791f3492a0e88bfb
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifKannada/full/ttf/NotoSerifKannada-Regular.ttf
# Malayalam
# NotoSerifMalayalam-Regular.ttf
# sha256: 7755029171b99ef7d5e7027c6f9148bcca3f3b95a41fd73cfc79d664a935cf30
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifMalayalam/full/ttf/NotoSerifMalayalam-Regular.ttf
# NotoSansMalayalam-Regular.ttf
# sha256: d18b5c10d85bba3d3d89775484bcfa731112f501ac070793ddbda5a36992520f
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMalayalam/full/ttf/NotoSansMalayalam-Regular.ttf
# Rachana-Regular.ttf
# sha256: a0d5c8417b58f98fe387758bb5c1a0e75225bb77759640bd7131c6c78dce16f1
https://gitlab.com/rit-fonts/RIT-Rachana/-/jobs/artifacts/1.2/download?job=build-tag
# Oriya
# NotoSansOriya-Regular.ttf
# sha256: e87f6a6c611c53dabb708a7ddfafa527b3ca7d0ccc5d2e0e659f46af19cb320f
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansOriya/full/ttf/NotoSansOriya-Regular.ttf
# Sinhala
# NotoSerifSinhala-Regular.ttf
# sha256: cafa8544ad87a1116d296193cfbf8be39b7927e67fd4af36b7e73105cf2ac85e
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifSinhala/full/ttf/NotoSerifSinhala-Regular.ttf
# Tamil
# NotoSerifTamil-Regular.ttf
# sha256: 00611c2dc5e6a09a5cd85eb87e135ae86aaafa5ac63543adf5eb2050784582a0
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTamil/full/ttf/NotoSerifTamil-Regular.ttf
# NotoSansTamil-Regular.ttf
# sha256: 0afbc221964b6048c6d771c525be474d21b288a621dce0fafedd695cc5c98e4e
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansTamil/full/ttf/NotoSansTamil-Regular.ttf
# Telugu
# NotoSerifTelugu-Regular.ttf
# sha256: 15cec4e867d25105c484301af0b4be577231e04e13364abdff844a4a5c9711dc
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTelugu/full/ttf/NotoSerifTelugu-Regular.ttf
# NotoSansTelugu-Regular.ttf
# sha256: e0595bcf47b907b2afb77a34ae64c3e8351f56452c66983660172c6b9ea15576
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansTelugu/full/ttf/NotoSansTelugu-Regular.ttf
##########################
# Brahmi-derived
#
# Khmer
# NotoSerifKhmer-Regular.ttf
# sha256: 1068ef26dadf6bd322f6bcef1990015ca3c998064d20842d67aaefc4b62d9cdb
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifKhmer/full/ttf/NotoSerifKhmer-Regular.ttf
# Myanmar
# NotoSansMyanmar-Regular.ttf
# sha256: e6d59055e5e7a8cc57ad8e04150136f4fc48bc3fca6f307c5e78f40f7e560a6d
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansMyanmar/full/ttf/NotoSansMyanmar-Regular.ttf
# PadaukBook-Regular.ttf
# sha256: b47b2639489d7cec5ad38d025f181b061767e4e161a41f19528e910f79fd03a1
https://software.sil.org/downloads/r/padauk/padauk-3.003.zip
# Thai and Lao
# NotoSerifThai-Regular.ttf
# sha256: 428afb46af2c025ed2b9fe39bda2fffce9475fa5d2e7ae7911771633014b91b0
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifThai/full/ttf/NotoSerifThai-Regular.ttf
# NotoSerifLao-Regular.ttf
# sha256: ff5ab4f3270c448b99b113a8ac6275bc6e4a4ca922ed2271df9a95132b5c2db6
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifLao/full/ttf/NotoSerifLao-Regular.ttf
# Tibetan
# NotoSerifTibetan-Regular.ttf -- RENAMED FROM NotoSansTibetan-Regular.ttf
# sha256: 2ac2555a88b5bcbbacc490003e96dd4d00d064daeee4a9465d68cf301f9886b3
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerifTibetan/full/ttf/NotoSerifTibetan-Regular.ttf
##########################
# Hangul
#
# NotoSansCJK-Regular.ttc
# sha256: b76b0433203017ca80401b2ee0dd69350349871c4b19d504c34dbdd80541690a
https://github.com/googlefonts/noto-cjk/archive/refs/tags/NotoSansV2.001.zip
##########################
# Hebrew
#
# NotoSansHebrew-Regular.ttf
# sha256: 6d925ace0a6ccce47b64e4a8d26869f423774d66dd6b9f67cf98441075e69582
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansHebrew/full/ttf/NotoSansHebrew-Regular.ttf
##########################
# Emoji
#
# NotoColorEmoji.ttf
# sha256: ad8eb6a1c403c73f0ff48ce288f2101de3814d0c6483f398023630e98331eeb2
https://fonts.google.com/download?family=Noto%20Color%20Emoji
# NotoEmoji-Regular.ttf
# sha256: 415dc6290378574135b64c808dc640c1df7531973290c4970c51fdeb849cb0c5
https://github.com/googlefonts/noto-emoji/raw/v2020-04-08-unicode12_1/fonts/NotoEmoji-Regular.ttf
# Symbola_hint.ttf
# sha256: 856de5857be48b8e31fc078fb93a9f3bd706b2552ba57b42863622f837cd3f35
https://fontlibrary.org/assets/downloads/symbola/cf81aeb303c13ce765877d31571dc5c7/symbola.zip
# LastResort-Regular.ttf
# sha256: da83a62294e74d963a10de4c3750ccf089273e3b7fc6744daef9844163ade078
https://github.com/unicode-org/last-resort-font/releases/download/15.000/LastResort-Regular.ttf
# DejaVuSans.ttf
# sha256: 6aaad3365c30c4f8d2504e569527e588d33eeae66dd7045bcfeef7413820db2a
http://sourceforge.net/projects/dejavu/files/dejavu/2.37/dejavu-fonts-ttf-2.37.tar.bz2
# NotoSansSymbols-Regular.ttf
# sha256: 0088617baec0e8ac47e022cc1f38695f772301c9ef6d1f24a785abbef1e05d79
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSansSymbols/full/ttf/NotoSansSymbols-Regular.ttf
# NotoSerif-Regular.ttf
# sha256: 504b8ec55d003cade88fb0a7bb93254ad81fd1cb29f4818d260300dbaef5d37b
https://cdn.jsdelivr.net/gh/notofonts/notofonts.github.io/fonts/NotoSerif/hinted/ttf/NotoSerif-Regular.ttf
# Blobmoji.ttf
# sha256: c3db3bb85e84ea7a2674399e281e004dc181ff9038b13c03bf01d3dd8197cfc8
https://github.com/C1710/blobmoji/releases/download/v14.0.1/Blobmoji.ttf
# SourceEmoji-BnW.otf
# sha256: e648822387b8860a74a478df07e1bdc6e56ee57e74e3f4b07272a8a9485c7186
https://github.com/adobe-fonts/source-emoji/releases/download/1.017/SourceEmoji-BnW.otf
================================================
FILE: images/gujarati/gujarati-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-gujarati.md](../../opentype-shaping-gujarati.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ac9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ac5,25cc,0abe
montage gujarati-matra-decompose-before.png right-arrow.png gujarati-matra-decompose-after.png -geometry +0+0 -background transparent gujarati-matra-decompose.png
## 2.7 Post-base consonants
> None
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a97,25cc,0abc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a97,0abc
montage gujarati-nukt-before.png right-arrow.png gujarati-nukt-after.png -geometry +0+0 -background transparent gujarati-nukt.png
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,25cc,0acd,0ab7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0ab7
montage gujarati-akhn-kssa-before.png right-arrow.png gujarati-akhn-kssa-after.png -geometry +0+0 -background transparent gujarati-akhn-kssa.png
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-before.png --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,25cc,0acd,0a9e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-after.png --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0a9e
montage gujarati-akhn-jnya-before.png right-arrow.png gujarati-akhn-jnya-after.png -geometry +0+0 -background transparent gujarati-akhn-jnya.png
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-before.png --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,25cc,0acd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-after.png --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,25cc
montage gujarati-rphf-before.png right-arrow.png gujarati-rphf-after.png -geometry +0+0 -background transparent gujarati-rphf.png
## 3.5 `rkrf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-before.png --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-after.png --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,0acd,0ab0
montage gujarati-rkrf-before.png right-arrow.png gujarati-rkrf-after.png -geometry +0+0 -background transparent gujarati-rkrf.png
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=25cc,0acd,0ab0
montage gujarati-blwf-before.png right-arrow.png gujarati-blwf-after.png -geometry +0+0 -background transparent gujarati-blwf.png
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-before.png --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aad,0acd,0ab0,25cc,0acd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aad,0acd,0ab0,0acd,25cc
montage gujarati-half-before.png right-arrow.png gujarati-half-after.png -geometry +0+0 -background transparent gujarati-half.png
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-before.png --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa4,25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-after.png --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa4,0acd,0ab0
montage gujarati-vatu-before.png right-arrow.png gujarati-vatu-after.png -geometry +0+0 -background transparent gujarati-vatu.png
## 3.12 `cjct`
> Note that Noto Serif Gujarati implements this in `pres` for unknown
> reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-before.png --features=-init,-pres,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,25cc,0acd,0aae
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-after.png --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa6,0acd,0aae
montage gujarati-cjct-before.png right-arrow.png gujarati-cjct-after.png -geometry +0+0 -background transparent gujarati-cjct.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0abf,0a9f,0acd,0a9d,0acd,0ab9,0acd,0aa4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-after.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0a9d,0acd,0ab9,0acd,0aa4,0abf
montage gujarati-matra-position-before.png right-arrow.png gujarati-matra-position-after.png -geometry +0+0 -background transparent gujarati-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,25cc,0aab,0acd,0aa8,0acd,0a9a,0ac2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aab,0acd,0aa8,0acd,0a9a,0ac2
montage gujarati-reph-position-before.png right-arrow.png gujarati-reph-position-after.png -geometry +0+0 -background transparent gujarati-reph-position.png
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-before.png --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9e,0acd,0a9a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-after.png --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9e,0acd,0a9a,25cc
montage gujarati-pres-before.png right-arrow.png gujarati-pres-after.png -geometry +0+0 -background transparent gujarati-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aa3,0abf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0aa3,0abf
montage gujarati-abvs-before.png right-arrow.png gujarati-abvs-after.png -geometry +0+0 -background transparent gujarati-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa3,0ac1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0aa3,0ac1
montage gujarati-blws-before.png right-arrow.png gujarati-blws-after.png -geometry +0+0 -background transparent gujarati-blws.png
## 5 `psts`
> Note: Noto Serif Gujarati implements this as an `abvs` lookup for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0ab0,0abe
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9c,0acd,0ab0,0abe
montage gujarati-psts-before.png right-arrow.png gujarati-psts-after.png -geometry +0+0 -background transparent gujarati-psts.png
## 5 `haln`
> Note: Noto Serif Gujarati implements this as a `blwm` lookup.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0a95,0abc,0acd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a95,0acd,0a95,0abc,0acd
montage gujarati-haln-before.png right-arrow.png gujarati-haln-after.png -geometry +0+0 -background transparent gujarati-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0ab9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0ab0,0acd,0ab9
montage gujarati-abvm-before.png right-arrow.png gujarati-abvm-after.png -geometry +0+0 -background transparent gujarati-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0aa0,0ac4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.ttf --unicodes=0a9f,0acd,0aa0,0ac4
montage gujarati-blwm-before.png right-arrow.png gujarati-blwm-after.png -geometry +0+0 -background transparent gujarati-blwm.png
================================================
FILE: images/gujarati/gujarati-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-gujarati.md](../../opentype-shaping-gujarati.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ac9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-decompose-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ac5,25cc,0abe
svg_stack.py --direction=h gujarati-matra-decompose-before.svg right-arrow.svg gujarati-matra-decompose-after.svg > gujarati-matra-decompose.svg
## 2.7 Post-base consonants
> None
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a97,25cc,0abc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a97,0abc
svg_stack.py --direction=h gujarati-nukt-before.svg right-arrow.svg gujarati-nukt-after.svg > gujarati-nukt.svg
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a95,25cc,0acd,0ab7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-kssa-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a95,0acd,0ab7
svg_stack.py --direction=h gujarati-akhn-kssa-before.svg right-arrow.svg gujarati-akhn-kssa-after.svg > gujarati-akhn-kssa.svg
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-before.svg --features=-init,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,25cc,0acd,0a9e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-akhn-jnya-after.svg --features=-init,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0a9e
svg_stack.py --direction=h gujarati-akhn-jnya-before.svg right-arrow.svg gujarati-akhn-jnya-after.svg > gujarati-akhn-jnya.svg
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-before.svg --features=-init,-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,25cc,0acd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rphf-after.svg --features=-init,+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,25cc
svg_stack.py --direction=h gujarati-rphf-before.svg right-arrow.svg gujarati-rphf-after.svg > gujarati-rphf.svg
## 3.5 `rkrf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-before.svg --features=-init,-rkrf,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-rkrf-after.svg --features=-init,+rkrf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,0acd,0ab0
svg_stack.py --direction=h gujarati-rkrf-before.svg right-arrow.svg gujarati-rkrf-after.svg > gujarati-rkrf.svg
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blwf-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0acd,0ab0
svg_stack.py --direction=h gujarati-blwf-before.svg right-arrow.svg gujarati-blwf-after.svg > gujarati-blwf.svg
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-before.svg --features=-init,-half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aad,0acd,0ab0,25cc,0acd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-half-after.svg --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aad,0acd,0ab0,0acd,25cc
svg_stack.py --direction=h gujarati-half-before.svg right-arrow.svg gujarati-half-after.svg > gujarati-half.svg
## 3.11 `vatu`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-before.svg --features=-init,-vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa4,25cc,0acd,0ab0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-vatu-after.svg --features=-init,+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa4,0acd,0ab0
svg_stack.py --direction=h gujarati-vatu-before.svg right-arrow.svg gujarati-vatu-after.svg > gujarati-vatu.svg
## 3.12 `cjct`
> Note that Noto Serif Gujarati implements this in `pres` for unknown
> reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-before.svg --features=-init,-pres,-cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,25cc,0acd,0aae
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-cjct-after.svg --features=-init,+cjct --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa6,0acd,0aae
svg_stack.py --direction=h gujarati-cjct-before.svg right-arrow.svg gujarati-cjct-after.svg > gujarati-cjct.svg
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=25cc,0abf,0a9b,0acd,0aad,0acd,0aa6,0acd,0aae
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-matra-position-after.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0aad,0acd,0aa6,0acd,0aae,0abf
svg_stack.py --direction=h gujarati-matra-position-before.svg right-arrow.svg gujarati-matra-position-after.svg > gujarati-matra-position.svg
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,25cc,0aab,0acd,0aa8,0acd,0a9a,0ac2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-reph-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aab,0acd,0aa8,0acd,0a9a,0ac2
svg_stack.py --direction=h gujarati-reph-position-before.svg right-arrow.svg gujarati-reph-position-after.svg > gujarati-reph-position.svg
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-before.svg --features=-init,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9e,0acd,0a9a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-pres-after.svg --features=-init,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9e,0acd,0a9a,25cc
svg_stack.py --direction=h gujarati-pres-before.svg right-arrow.svg gujarati-pres-after.svg > gujarati-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aa3,0abf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0aa3,0abf
svg_stack.py --direction=h gujarati-abvs-before.svg right-arrow.svg gujarati-abvs-after.svg > gujarati-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aab,0ac1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aab,0ac1
svg_stack.py --direction=h gujarati-blws-before.svg right-arrow.svg gujarati-blws-after.svg > gujarati-blws.svg
## 5 `psts`
> Note: Noto Serif Gujarati implements this as an `abvs` lookup for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0ab0,0abe
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-psts-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9c,0acd,0ab0,0abe
svg_stack.py --direction=h gujarati-psts-before.svg right-arrow.svg gujarati-psts-after.svg > gujarati-psts.svg
## 5 `haln`
> Note: Noto Serif Gujarati implements this as a `blwm` lookup in
> addition to `haln`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-haln-after.svg --features=-init,+haln,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa0,0acd
hb-view --font-size=110 --margin=2,24,2,16 --output-file=gujarati-haln-before.svg --features=-init,-haln,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0aa0,0acd
svg_stack.py --direction=h gujarati-haln-before.svg right-arrow.svg gujarati-haln-after.svg > gujarati-haln.svg
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0ab9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab0,0acd,0ab9
svg_stack.py --direction=h gujarati-abvm-before.svg right-arrow.svg gujarati-abvm-after.svg > gujarati-abvm.svg
## 6 `blwm`
hb-view --font-size=110 --margin=2,48,16,16 --output-file=gujarati-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0ab0,0ae3
hb-view --font-size=110 --margin=2,16,16,16 --output-file=gujarati-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0a9b,0acd,0ab0,0ae3
svg_stack.py --direction=h gujarati-blwm-before.svg right-arrow.svg gujarati-blwm-after.svg > gujarati-blwm.svg
## 6 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-dist-before.svg --features=-init,-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab9,0acd,0aa3,0aa6,0acd,0ab5
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gujarati-dist-after.svg --features=-init,+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGujarati-Regular.otf --unicodes=0ab9,0acd,0aa3,0aa6,0acd,0ab5
svg_stack.py --direction=h gujarati-dist-before.svg right-arrow.svg gujarati-dist-after.svg > gujarati-dist.svg
================================================
FILE: images/gurmukhi/gurmukhi-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-gurmukhi.md](../../opentype-shaping-gurmukhi.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
> Note: There is, at present, no Noto Serif for Gurmukhi; therefore
> (unlike the other Indic scripts) these examples use Noto Sans. Serif
> would be preferrable if it appears in the future, though, due to the
> increased stroke contrast.
## 2.7 Post-base consonants
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-before.png --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-after.png --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a2f
montage gurmukhi-pstf-before.png right-arrow.png gurmukhi-pstf-after.png -geometry +0+0 -background transparent gurmukhi-pstf.png
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-before.png --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a38,25cc,0a3c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-after.png --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a38,0a3c
montage gurmukhi-nukt-before.png right-arrow.png gurmukhi-nukt-after.png -geometry +0+0 -background transparent gurmukhi-nukt.png
## 3.3 `akhn`
> Note: Noto Sans Gurmukhi has no `akhn` feature implemented.
## 3.4 `rphf`
> Note: Noto Sans Gurmukhi has no `rphf` feature implemented.
## 3.7 `blwf`
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a30
montage gurmukhi-blwf-ra-before.png right-arrow.png gurmukhi-blwf-ra-after.png -geometry +0+0 -background transparent gurmukhi-blwf-ra.png
### Va
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a35
montage gurmukhi-blwf-va-before.png right-arrow.png gurmukhi-blwf-va-after.png -geometry +0+0 -background transparent gurmukhi-blwf-va.png
### Ha
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-before.png --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a39
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-after.png --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=25cc,0a4d,0a39
montage gurmukhi-blwf-ha-before.png right-arrow.png gurmukhi-blwf-ha-after.png -geometry +0+0 -background transparent gurmukhi-blwf-ha.png
## 3.9 `half`
> Note: Gurmukhi fonts seem to stick to explicit halant-forms.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-before.png --features=-init,-half,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a2d,0a4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-after.png --features=-init,+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a2d,0a4d
montage gurmukhi-half-before.png right-arrow.png gurmukhi-half-after.png -geometry +0+0 -background transparent gurmukhi-half.png
## 3.10 `pstf`
> Same as 2.7
## 3.11 `vatu`
> Note: Noto Gurmukhi has no `vatu` feature.
## 3.12 `cjct`
> Note: Noto Gurmukhi has no `cjct` feature.
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-matra-position-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a25,0a3f,0a4d,0a32,0a4d,0a35,0a4d,0a1a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-matra-position-after.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a25,0a4d,0a32,0a4d,0a35,0a4d,0a1a,0a3f
montage gurmukhi-matra-position-before.png right-arrow.png gurmukhi-matra-position-after.png -geometry +0+0 -background transparent gurmukhi-matra-position.png
## 4.3 Reph position
> Note: Noto Gurmukhi has no `rphf` feature and no Reph
> glyph. Therefore no illustration of Reph positioning is possible.
## 5 `init`
> Note: Noto Gurmukhi has no `init` feature, and it is unclear from
> the Microsoft specification whether `init` is defined for Gurmukhi.
## 5 `pres`
> Note: Noto Gurmukhi has no `pres` feature, even though it would be
> possible to implement one for the i-matra (`U+0A3F`).
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-before.png --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a13,0a71
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-after.png --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a13,0a71
montage gurmukhi-abvs-before.png right-arrow.png gurmukhi-abvs-after.png -geometry +0+0 -background transparent gurmukhi-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-before.png --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhiUI-Regular.ttf --unicodes=0a15,25cc,0a4d,0a30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-after.png --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhiUI-Regular.ttf --unicodes=0a15,0a4d,0a30
montage gurmukhi-blws-before.png right-arrow.png gurmukhi-blws-after.png -geometry +0+0 -background transparent gurmukhi-blws.png
## 5 `psts`
> Note: Noto Sans Gurmukhi has no `psts` feature.
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-before.png --features=-init,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a32,0a3c,0a4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-after.png --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a32,0a3c,0a4d
montage gurmukhi-haln-before.png right-arrow.png gurmukhi-haln-after.png -geometry +0+0 -background transparent gurmukhi-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-before.png --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a20,0a48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-after.png --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a20,0a48
montage gurmukhi-abvm-before.png right-arrow.png gurmukhi-abvm-after.png -geometry +0+0 -background transparent gurmukhi-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-before.png --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a06,0a42
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-after.png --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.ttf --unicodes=0a06,0a42
montage gurmukhi-blwm-before.png right-arrow.png gurmukhi-blwm-after.png -geometry +0+0 -background transparent gurmukhi-blwm.png
================================================
FILE: images/gurmukhi/gurmukhi-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-gurmukhi.md](../../opentype-shaping-gurmukhi.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
> Note: There is, at present, no Noto Serif for Gurmukhi; therefore
> (unlike the other Indic scripts) these examples use Noto Sans. Serif
> would be preferrable if it appears in the future, though, due to the
> increased stroke contrast.
## 2.7 Post-base consonants
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-before.svg --features=-init,-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-pstf-after.svg --features=-init,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a2f
svg_stack.py --direction=h gurmukhi-pstf-before.svg right-arrow.svg gurmukhi-pstf-after.svg > gurmukhi-pstf.svg
#### Duplicates for other subsections
cp gurmukhi-pstf.svg gurmukhi-pstf-1.svg
cluster_styles = [
## 3.2 `nukt`
> Noto Serif replaces "Dda,Nukta" with "Rra". That sequence is chosen
> because it means the change in glyphs is easily seen, but perhaps it
> would be better to use an example that has an "-after" form that is
> more clearly nukta-bearing?
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-before.svg --features=-init,-nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a21,25cc,0a3c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-nukt-after.svg --features=-init,+nukt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a21,0a3c
svg_stack.py --direction=h gurmukhi-nukt-before.svg right-arrow.svg gurmukhi-nukt-after.svg > gurmukhi-nukt.svg
## 3.3 `akhn`
> Note: Noto Sans/Serif Gurmukhi have no `akhn` feature implemented.
## 3.4 `rphf`
> Note: Noto Sans/Serif Gurmukhi have no `rphf` feature implemented.
## 3.7 `blwf`
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ra-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a30
svg_stack.py --direction=h gurmukhi-blwf-ra-before.svg right-arrow.svg gurmukhi-blwf-ra-after.svg > gurmukhi-blwf-ra.svg
### Va
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-va-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a35
svg_stack.py --direction=h gurmukhi-blwf-va-before.svg right-arrow.svg gurmukhi-blwf-va-after.svg > gurmukhi-blwf-va.svg
### Ha
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-before.svg --features=-init,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a39
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwf-ha-after.svg --features=-init,+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=25cc,0a4d,0a39
svg_stack.py --direction=h gurmukhi-blwf-ha-before.svg right-arrow.svg gurmukhi-blwf-ha-after.svg > gurmukhi-blwf-ha.svg
## 3.9 `half`
> Note: Gurmukhi fonts seem to stick to explicit halant-forms.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-before.svg --features=-init,-half,-mark,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a23,0a4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-half-after.svg --features=-init,+half,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a23,0a4d
svg_stack.py --direction=h gurmukhi-half-before.svg right-arrow.svg gurmukhi-half-after.svg > gurmukhi-half.svg
## 3.10 `pstf`
> Same as 2.7
## 3.11 `vatu`
> Note: Noto Gurmukhi has no `vatu` feature.
## 3.12 `cjct`
> Note: Noto Gurmukhi has no `cjct` feature.
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,24,16 --output-file=gurmukhi-matra-position-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a25,0a3f,0a4d,0a32,0a4d,0a35,0a4d,0a1a
hb-view --font-size=110 --margin=2,16,24,16 --output-file=gurmukhi-matra-position-after.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a25,0a4d,0a32,0a4d,0a35,0a4d,0a1a,0a3f
svg_stack.py --direction=h gurmukhi-matra-position-before.svg right-arrow.svg gurmukhi-matra-position-after.svg > gurmukhi-matra-position.svg
## 4.3 Reph position
> Note: Noto Gurmukhi has no `rphf` feature and no Reph
> glyph. Therefore no illustration of Reph positioning is possible.
## 5 `init`
> Note: Noto Gurmukhi has no `init` feature, and it is unclear from
> the Microsoft specification whether `init` is defined for Gurmukhi.
## 5 `pres`
> Note: Noto Gurmukhi has no `pres` feature, even though it would be
> possible to implement one for the i-matra (`U+0A3F`).
## 5 `abvs`
hb-view --font-size=110 --margin=2,32,2,16 --output-file=gurmukhi-abvs-before.svg --features=-init,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a13,0a71
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvs-after.svg --features=-init,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a13,0a71
svg_stack.py --direction=h gurmukhi-abvs-before.svg right-arrow.svg gurmukhi-abvs-after.svg > gurmukhi-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-before.svg --features=-init,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a15,25cc,0a4d,0a30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blws-after.svg --features=-init,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a15,0a4d,0a30
svg_stack.py --direction=h gurmukhi-blws-before.svg right-arrow.svg gurmukhi-blws-after.svg > gurmukhi-blws.svg
## 5 `psts`
> Note: Noto Sans Gurmukhi has no `psts` feature.
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-before.svg --features=-init,-haln,-half,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a5c,0a4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-haln-after.svg --features=-init,+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a5c,0a4d
svg_stack.py --direction=h gurmukhi-haln-before.svg right-arrow.svg gurmukhi-haln-after.svg > gurmukhi-haln.svg
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-before.svg --features=-init,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a20,0a48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-abvm-after.svg --features=-init,+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a20,0a48
svg_stack.py --direction=h gurmukhi-abvm-before.svg right-arrow.svg gurmukhi-abvm-after.svg > gurmukhi-abvm.svg
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-before.svg --features=-init,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a17,0a51
hb-view --font-size=110 --margin=2,16,2,16 --output-file=gurmukhi-blwm-after.svg --features=-init,+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifGurmukhi-Regular.otf --unicodes=0a17,0a51
svg_stack.py --direction=h gurmukhi-blwm-before.svg right-arrow.svg gurmukhi-blwm-after.svg > gurmukhi-blwm.svg
================================================
FILE: images/hangul/hangul-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-hangul.md](../../opentype-shaping-hangul.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-ljmo,-vjmo,-tjmo` in examples where
> the jamo features are not being explained. This may not be
> neccessary in all fonts.
## LV example
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lv-syllable.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166
## LVT example
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lvt-syllable.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166,11ae
## 3. Compose the syllable
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-before.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,200b,1171,200b,11b8
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-after.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,1171,11b8
montage hangul-compose-before.png right-arrow.png hangul-compose-after.png -geometry +0+0 -background transparent hangul-compose.png
## 4. Decompose the syllable
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-before.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,1172,11af
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-after.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,200d,1172,200d,11af
montage hangul-decompose-before.png right-arrow.png hangul-decompose-after.png -geometry +0+0 -background transparent hangul-decompose.png
## 5.2 `ljmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-before.png --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-after.png --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
montage hangul-ljmo-before.png right-arrow.png hangul-ljmo-after.png -geometry +0+0 -background transparent hangul-ljmo.png
## 5.3 `vjmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-before.png --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-after.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
montage hangul-vjmo-before.png right-arrow.png hangul-vjmo-after.png -geometry +0+0 -background transparent hangul-vjmo.png
## 5.4 `tjmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-before.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-after.png --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,d7d9
montage hangul-tjmo-before.png right-arrow.png hangul-tjmo-after.png -geometry +0+0 -background transparent hangul-tjmo.png
## 6. Tone marks
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-before.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,200c,302f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-after.png --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,302f
================================================
FILE: images/hangul/hangul-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-hangul.md](../../opentype-shaping-hangul.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-ljmo,-vjmo,-tjmo` in examples where
> the jamo features are not being explained. This may not be
> neccessary in all fonts.
## LV example
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lv-syllable.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166
## LVT example
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-lvt-syllable.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110e,1166,11ae
## 3. Compose the syllable
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-before.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,200b,1171,200b,11b8
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-compose-after.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1108,1171,11b8
svg_stack --direction=h hangul-compose-before.svg right-arrow.svg hangul-compose-after.svg > hangul-compose.svg
## 4. Decompose the syllable
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-before.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1106,1172,11af
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-decompose-after.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=3141,200b,3160,200b,3139
svg_stack --direction=h hangul-decompose-before.svg right-arrow.svg hangul-decompose-after.svg > hangul-decompose.svg
## 5.2 `ljmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-before.svg --features=-ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-ljmo-after.svg --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb
svg_stack --direction=h hangul-ljmo-before.svg right-arrow.svg hangul-ljmo-after.svg > hangul-ljmo.svg
## 5.3 `vjmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-before.svg --features=+ljmo,-vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,200b,1169,200b,11bb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-vjmo-after.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,200b,11bb
svg_stack --direction=h hangul-vjmo-before.svg right-arrow.svg hangul-vjmo-after.svg > hangul-vjmo.svg
## 5.4 `tjmo`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-before.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,200b,11bb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tjmo-after.svg --features=+ljmo,+vjmo,+tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=110f,1169,11bb
svg_stack --direction=h hangul-tjmo-before.svg right-arrow.svg hangul-tjmo-after.svg > hangul-tjmo.svg
## 6. Tone marks
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-before.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,200c,302f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hangul-tone-after.svg --features=+ljmo,+vjmo,-tjmo --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifKR-Regular.otf --unicodes=1111,116b,11a8,302f
svg_stack.py --direction=h hangul-tone-before.svg right-arrow.svg hangul-tone-after.svg > hangul-tone.svg
================================================
FILE: images/hebrew/hebrew-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-hebrew.md](../../opentype-shaping-hebrew.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 1.1 `ccmp`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-ccmp-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05b3,05bd
hb-view --font-size=110 --margin=2,24,2,24 --output-file=hebrew-ccmp-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05b3,05bd
montage hebrew-ccmp-before.png right-arrow.png hebrew-ccmp-after.png -geometry +0+0 -background transparent hebrew-ccmp.png
## 2 Alphabetic Presentation Forms
> Note: Noto Sans Hebrew implements these compositions in a `ccmp` lookup.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,05bc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,05bc
montage hebrew-apf-before.png right-arrow.png hebrew-apf-after.png -geometry +0+0 -background transparent hebrew-apf.png
## 4.1 `liga`
hb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-liga-before.png --features=-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=fb4f,05b1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-liga-after.png --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=fb4f,05b1
montage hebrew-liga-before.png right-arrow.png hebrew-liga-after.png -geometry +0+0 -background transparent hebrew-liga.png
## 4.2 `dlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-before.png --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05d0,05dc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-after.png --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05d0,05dc
montage hebrew-dlig-before.png right-arrow.png hebrew-dlig-after.png -geometry +0+0 -background transparent hebrew-dlig.png
## 5.1 `kern`
> Note: Noto Sans Hebrew has `kern` lookups, but so far I have not
> been able to identify an easily visible example pair.
## 5.2 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,059a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.ttf --unicodes=05e7,059a
montage hebrew-mark-before.png right-arrow.png hebrew-mark-after.png -geometry +0+0 -background transparent hebrew-mark.png
================================================
FILE: images/hebrew/hebrew-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-hebrew.md](../../opentype-shaping-hebrew.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 1.1 `ccmp`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-ccmp-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d5,05c1
hb-view --font-size=110 --margin=2,24,2,24 --output-file=hebrew-ccmp-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d5,05c1
svg_stack.py --direction=h hebrew-ccmp-before.svg right-arrow.svg hebrew-ccmp-after.svg > hebrew-ccmp.svg
## 2 Alphabetic Presentation Forms
> Note: Noto Sans Hebrew implements these compositions in a `ccmp`
> lookup. Noto Serif Hebrew does not.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d9,05b4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-apf-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=fb1d
svg_stack.py --direction=h hebrew-apf-before.svg right-arrow.svg hebrew-apf-after.svg > hebrew-apf.svg
## 4.1 `liga`
hb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-liga-before.svg --features=-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=25cc,05b1,200d,05bd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-liga-after.svg --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=25cc,05b1,200d,05bd
svg_stack.py --direction=h hebrew-liga-before.svg right-arrow.svg hebrew-liga-after.svg > hebrew-liga.svg
## `calt`
> The `calt` feature clearly gets applied; it's unclear why it was
> left off of the list in the initial release of this document.
hb-view --font-size=110 --margin=2,16,2,24 --output-file=hebrew-calt-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05dc,05b4,05b8,05dd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-calt-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05dc,05b4,05b8,05dd
svg_stack.py --direction=h hebrew-calt-before.svg right-arrow.svg hebrew-calt-after.svg > hebrew-calt.svg
## 4.2 `dlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-before.svg --features=-dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d0,05dc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-dlig-after.svg --features=+dlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d0,05dc
svg_stack.py --direction=h hebrew-dlig-before.svg right-arrow.svg hebrew-dlig-after.svg > hebrew-dlig.svg
## 5.1 `kern`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d2,05e2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d2,05e2
svg_stack.py --direction=h hebrew-kern-before.svg right-arrow.svg hebrew-kern-after.svg > hebrew-kern.svg
## 5.2 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d3,05b1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=hebrew-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifHebrew-Regular.otf --unicodes=05d3,05b1
svg_stack.py --direction=h hebrew-mark-before.svg right-arrow.svg hebrew-mark-after.svg > hebrew-mark.svg
================================================
FILE: images/images-index.md
================================================
# Images #
This section includes a separate subdirectory for each script,
containing the images included in the relevant script-shaping document.
Also included in each directory is a log file containing the exact
commands used to generate the images.
PNG glyph images are generated using the `hb-view` utility from
Harfbuzz and the `montage` utility from ImageMagick. The commands were
run on a Linux-based system but, apart from minor differences in the
file-path to the font file specified, should be completely
reproducible on other operating systems.
SVG glyph images are generated using the `hb-view` utility from
Harfbuzz and the [`svg_stack`](https://github.com/astraw/svg_stack/)
Python utility. The commands were run on a Linux-based system but,
apart from minor differences in the file-path to the font file
specified, should be completely reproducible on other operating
systems.
Long-term, the PNG images will be replaced by SVG images —
although, at present, there are still some images that are generated
in PNG form (because kinks remain to be worked out in the SVG-image
alignment process and the corresponding CSS styling).
The font files used must be publicly and freely available, open-source
fonts. By default, the Noto fonts from Google are the starting point.
A list of the fonts used to generate the latest version of the images
is provided in the [example-fonts.txt](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/example-fonts.txt) file, with
URLs and SHA checksums for each file.
The image file names follow a simple, but important, pattern:
_script_-_featureillustrated_.png
Intermediary images copy the pattern but append _-before_ or _-after_
when depicting the before-or-after state of an applied OpenType
feature, or some other suffix as appropriate.
If you are suggesting an update to an image, please utilize the same
commands and general syntax. If you are suggesting adding a new image,
please also follow the file-name pattern. Patches to the image-generation log for
each script are appreciated, in order to keep the log up-to-date.
- Indic
- [Devanagari](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/devanagari/devanagari-svg-image-generation-log.md)
- [Bengali](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/bengali/bengali-svg-image-generation-log.md)
- [Gujarati](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/gujarati/gujarati-svg-image-generation-log.md)
- [Gurmukhi](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/gurmukhi/gurmukhi-svg-image-generation-log.md)
- [Kannada](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/kannada/kannada-svg-image-generation-log.md)
- [Malayalam](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/malayalam/malayalam-svg-image-generation-log.md)
- [Oriya](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/oriya/oriya-svg-image-generation-log.md)
- [Tamil](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/tamil/tamil-svg-image-generation-log.md)
- [Telugu](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/telugu/telugu-svg-image-generation-log.md)
- [Sinhala](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/sinhala/sinhala-svg-image-generation-log.md)
- Brahmi-derived
- [Khmer](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/khmer/khmer-svg-image-generation-log.md)
- [Lao](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/thai-lao/thai-lao-svg-image-generation-log.md)
- [Myanmar](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/myanmar/myanmar-svg-image-generation-log.md)
- [Thai](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/thai-lao/thai-lao-svg-image-generation-log.md)
- [Tibetan](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/tibetan/tibetan-svg-image-generation-log.md)
- Arabic
- [Arabic](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/arabic/arabic-svg-image-generation-log.md)
- [Syriac](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/syriac/syriac-svg-image-generation-log.md)
- [N'Ko](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/nko/nko-svg-image-generation-log.md)
- [Mongolian](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/mongolian/mongolian-svg-image-generation-log.md)
- Hangul
- [Hangul](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/hangul/hangul-svg-image-generation-log.md)
- Hebrew
- [Hebrew](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/hebrew/hebrew-svg-image-generation-log.md)
- Emoji
- [Emoji](https://github.com/n8willis/opentype-shaping-documents/blob/master/images/emoji/emoji-png-image-generation-log.md)
================================================
FILE: images/kannada/kannada-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-kannada.md](../../opentype-shaping-kannada.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cc8
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cc6,25cc,0cd6
montage kannada-matra-decomposition-before.png right-arrow.png kannada-matra-decomposition-after.png -geometry +0+0 -background transparent kannada-matra-decomposition.png
## 3.2 `nukt`
> Note: Noto Serif Kannada implements this in `blwm` for unknown
> reasons.
hb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-nukt-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,25cc,0cbc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-nukt-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cbc
montage kannada-nukt-before.png right-arrow.png kannada-nukt-after.png -geometry +0+0 -background transparent kannada-nukt.png
## 3.3 `akhn`
> Note: Noto Serif Kannada implements this in both `akhn` and in
> `blwf` for unknown reasons.
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-before.png --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-after.png --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7
montage kannada-akhn-kssa-before.png right-arrow.png kannada-akhn-kssa-after.png -geometry +0+0 -background transparent kannada-akhn-kssa.png
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-before.png --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c9c,0ccd,0c9e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-after.png --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c9c,0ccd,0c9e
montage kannada-akhn-jnya-before.png right-arrow.png kannada-akhn-jnya-after.png -geometry +0+0 -background transparent kannada-akhn-jnya.png
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-before.png --features=-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc
montage kannada-rphf-before.png right-arrow.png kannada-rphf-after.png -geometry +0+0 -background transparent kannada-rphf.png
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=25cc,0ccd,0ca1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=25cc,0ccd,0ca1
montage kannada-blwf-before.png right-arrow.png kannada-blwf-after.png -geometry +0+0 -background transparent kannada-blwf.png
## 4.3 Reph positioning
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-before.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,25cc,0cad,0ccd,0cb3,0cc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-after.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb0,0ccd,0cad,0ccd,0cb3,0cc2
montage kannada-reph-position-before.png right-arrow.png kannada-reph-position-after.png -geometry +0+0 -background transparent kannada-reph-position.png
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb5,25cc,0cc1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cb5,0cc1
montage kannada-pres-before.png right-arrow.png kannada-pres-after.png -geometry +0+0 -background transparent kannada-pres.png
## 5 `abvs`
> Note: Noto Serif Kannada has some abvs-like substituations in `pres`
> lookup 14 (via single-sub lookup 23), but I have not yet figured out
> whether they are,
> linguistically speaking, actually above-base features. Thus, they are
> included here, but might not be used in the shaping document.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0ca3,0ccc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0ca3,0ccc
montage kannada-abvs-before.png right-arrow.png kannada-abvs-after.png -geometry +0+0 -background transparent kannada-abvs.png
## 5 `blws`
> Note: Note Serif Kannada has some blws-like substitutions in
> `pres` lookup 12 (via contextual chaining lookups 7 and 8 (via
> single-sub lookups 19 and 20)), but I have not yet figured out
> whether they are, linguistically speaking, actually above-base
> features. Thus, they are included here, but might not be used in the
> shaping document.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7,0cc1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0ccd,0cb7,0cc1
montage kannada-blws-before.png right-arrow.png kannada-blws-after.png -geometry +0+0 -background transparent kannada-blws.png
## 5 `psts`
> Note: Noto Serif Kannada has some psts-like lookups in `pres` lookup 12 (via single-sub lookup 21 and 22 (via contextual chaining lookup
> 9 and 10)), but I have not yet figured out whether they are,
> linguistically speaking, actually post-base features. Thus, they are
> included here, but might not be used in the shaping document.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0cbe
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c95,0cbe
montage kannada-psts-before.png right-arrow.png kannada-psts-after.png -geometry +0+0 -background transparent kannada-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-before.png --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c98,0ccd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0c98,0ccd
montage kannada-haln-before.png right-arrow.png kannada-haln-after.png -geometry +0+0 -background transparent kannada-haln.png
## 6 `abvm`
> Note: Noto Serif Kannada does not include an `abvm` feature.
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-before.png --features=-blwm,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cc1,0cbc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.ttf --unicodes=0cab,0cc1,0cbc
montage kannada-blwm-before.png right-arrow.png kannada-blwm-after.png -geometry +0+0 -background transparent kannada-blwm.png
================================================
FILE: images/kannada/kannada-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-kannada.md](../../opentype-shaping-kannada.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cc8
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-matra-decomposition-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cc6,25cc,0cd6
svg_stack.py --direction=h kannada-matra-decomposition-before.svg right-arrow.svg kannada-matra-decomposition-after.svg > kannada-matra-decomposition.svg
## 3.2 `nukt`
> Note: Noto Serif Kannada implements this in `blwm` for unknown
> reasons.
hb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-nukt-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,25cc,0cbc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-nukt-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cbc
svg_stack.py --direction=h kannada-nukt-before.svg right-arrow.svg kannada-nukt-after.svg > kannada-nukt.svg
## 3.3 `akhn`
> Note: Noto Serif Kannada implements this in both `akhn` and in
> `blwf` for unknown reasons.
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-before.svg --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0cb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-kssa-after.svg --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0cb7
svg_stack.py --direction=h kannada-akhn-kssa-before.svg right-arrow.svg kannada-akhn-kssa-after.svg > kannada-akhn-kssa.svg
### JNya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-before.svg --features=-akhn,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c9c,0ccd,0c9e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-akhn-jnya-after.svg --features=+akhn, --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c9c,0ccd,0c9e
svg_stack.py --direction=h kannada-akhn-jnya-before.svg right-arrow.svg kannada-akhn-jnya-after.svg > kannada-akhn-jnya.svg
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-before.svg --features=-rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc
svg_stack.py --direction=h kannada-rphf-before.svg right-arrow.svg kannada-rphf-after.svg > kannada-rphf.svg
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ccd,0ca1
hb-view --font-size=110 --margin=2,24,2,16 --output-file=kannada-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ccd,0ca1
svg_stack.py --direction=h kannada-blwf-before.svg right-arrow.svg kannada-blwf-after.svg > kannada-blwf.svg
## 4.3 Reph positioning
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-before.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,25cc,0cad,0ccd,0cb3,0cc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-reph-position-after.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb0,0ccd,0cad,0ccd,0cb3,0cc2
svg_stack.py --direction=h kannada-reph-position-before.svg right-arrow.svg kannada-reph-position-after.svg > kannada-reph-position.svg
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb5,25cc,0cc1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cb5,0cc1
svg_stack.py --direction=h kannada-pres-before.svg right-arrow.svg kannada-pres-after.svg > kannada-pres.svg
## 5 `abvs`
> Note: Noto Serif Kannada has some abvs-like substituations in
> `psts` lookups for unknown reasons. This is one.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0ca3,0ccd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-abvs-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0ca3,0ccd
svg_stack.py --direction=h kannada-abvs-before.svg right-arrow.svg kannada-abvs-after.svg > kannada-abvs.svg
## 5 `blws`
> Note: Note Serif Kannada has some blws-like substitutions in
> `psts` lookups for unknown reasons. This is one.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0ca4,0ccd,0caf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c95,0ccd,0ca4,0ccd,0caf
svg_stack.py --direction=h kannada-blws-before.svg right-arrow.svg kannada-blws-after.svg > kannada-blws.svg
## 5 `psts`
> Note: Noto Serif Kannada has moved many `pres` lookups into `psts`
> in the most recent release.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-before.svg --features=-pres,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ca4,0cbf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-psts-after.svg --features=+pres,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=25cc,0ca4,0cbf
svg_stack.py --direction=h kannada-psts-before.svg right-arrow.svg kannada-psts-after.svg > kannada-psts.svg
## 5 `haln`
> Note: Noto Serif Kannada does not include a `haln` feature. Similar
> behavior is found in `psts`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-before.svg --features=-haln,-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c98,0ccd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-haln-after.svg --features=+haln,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c98,0ccd
svg_stack.py --direction=h kannada-haln-before.svg right-arrow.svg kannada-haln-after.svg > kannada-haln.svg
## 6 `dist`
hb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-dist-before.svg --features=-dist,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c93,0cf3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0c93,0cf3
svg_stack.py --direction=h kannada-dist-before.svg right-arrow.svg kannada-dist-after.svg > kannada-dist.svg
## 6 `abvm`
> Note: Noto Serif Kannada does not include an `abvm` feature.
## 6 `blwm`
hb-view --font-size=110 --margin=2,32,2,16 --output-file=kannada-blwm-before.svg --features=-blwm,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cc1,0cbc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=kannada-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKannada-Regular.otf --unicodes=0cab,0cc1,0cbc
svg_stack.py --direction=h kannada-blwm-before.svg right-arrow.svg kannada-blwm-after.svg > kannada-blwm.svg
================================================
FILE: images/khmer/khmer-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-khmer.md](../../opentype-shaping-khmer.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### Coeng forms
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-before.png --features=-blwf,-pstf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-after.png --features=+blwf,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783
montage khmer-coeng-kha-before.png right-arrow.png khmer-coeng-kha-after.png -geometry +0+0 -background transparent khmer-coeng-kha.png
## The khmr shaping model
### Robat
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-robat.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17cc
## 2.2 Matra decomposition
> Note: Noto Serif Khmer has decompositions for the
> non-canonical-in-Unicode multi-part matras implemented in `psts`
> features, but I have not figured out how to activate them in isolation.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-after.png --features=+ccmp,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c1,25cc,17b6
montage khmer-matra-decomposition-before.png right-arrow.png khmer-matra-decomposition-after.png -geometry +0+0 -background transparent khmer-matra-decomposition.png
## 2.4 Pre-base-reordering Ro
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-before.png --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,25cc,17d2,179a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-after.png --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,17d2,179a
montage khmer-pref-before.png right-arrow.png khmer-pref-after.png -geometry +0+0 -background transparent khmer-pref.png
## 3.1 `locl`
No examples found in Noto Khmer.
## 3.2 `ccmp`
No examples found in Noto Khmer.
## 3.3 `pref`
Same as pre-base-reordering Ro.
## 3.4 `blwf`
> Note: altered bottom-margin number to fit.
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,25cc,17d2,17af
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,17d2,17af
montage khmer-blwf-before.png right-arrow.png khmer-blwf-after.png -geometry +0+0 -background transparent khmer-blwf.png
## 3.5 `abvf`
> Note: Noto Serif Khmer implements this as a `abvs` lookup.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-before.png --features=-abvf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-after.png --features=+abvf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca
montage khmer-abvf-before.png right-arrow.png khmer-abvf-after.png -geometry +0+0 -background transparent khmer-abvf.png
## 3.6 `pstf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f
montage khmer-pstf-before.png right-arrow.png khmer-pstf-after.png -geometry +0+0 -background transparent khmer-pstf.png
## 3.7 `cfar`
No examples found in Noto Khmer.
## 4 `pres`
> Note: Adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781
montage khmer-pres-before.png right-arrow.png khmer-pres-after.png -geometry +0+0 -background transparent khmer-pres.png
## 4 `blws`
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780
montage khmer-blws-before.png right-arrow.png khmer-blws-after.png -geometry +0+0 -background transparent khmer-blws.png
## 4 `abvs`
hb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd
hb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd
montage khmer-abvs-before.png right-arrow.png khmer-abvs-after.png -geometry +0+0 -background transparent khmer-abvs.png
## 4 `psts`
> Note: Adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788
## 4 `clig`
> Note: Noto Serif Khmer implements this twice, in clig and liga.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-before.png --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-after.png --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6
montage khmer-clig-before.png right-arrow.png khmer-clig-after.png -geometry +0+0 -background transparent khmer-clig.png
## 4 `liga`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-liga-before.png --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-liga-after.png --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5
montage khmer-liga-before.png right-arrow.png khmer-liga-after.png -geometry +0+0 -background transparent khmer-liga.png
## 5 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-before.png --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-after.png --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc
montage khmer-dist-before.png right-arrow.png khmer-dist-after.png -geometry +0+0 -background transparent khmer-dist.png
## 5 `kern`
No examples found in Noto Khmer....
## 5 `blwm`
> Note: adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc
montage khmer-blwm-before.png right-arrow.png khmer-blwm-after.png -geometry +0+0 -background transparent khmer-blwm.png
## 5 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7
montage khmer-abvm-before.png right-arrow.png khmer-abvm-after.png -geometry +0+0 -background transparent khmer-abvm.png
## 5 `mkmk`
No examples found in Noto Khmer.
================================================
FILE: images/khmer/khmer-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-khmer.md](../../opentype-shaping-khmer.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### Coeng forms
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-before.svg --features=-blwf,-pstf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-coeng-kha-after.svg --features=+blwf,+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17d2,1783
svg_stack.py --direction=h khmer-coeng-kha-before.svg right-arrow.svg khmer-coeng-kha-after.svg > khmer-coeng-kha.svg
## The khmr shaping model
### Robat
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-robat.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17cc
## 2.2 Matra decomposition
> Note: Noto Serif Khmer has decompositions for the
> non-canonical-in-Unicode multi-part matras implemented in `psts`
> features, but I have not figured out how to activate them in isolation.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-matra-decomposition-after.svg --features=+ccmp,+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=17c1,25cc,17b6
svg_stack.py --direction=h khmer-matra-decomposition-before.svg right-arrow.svg khmer-matra-decomposition-after.svg > khmer-matra-decomposition.svg
## 2.4 Pre-base-reordering Ro
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-before.svg --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,25cc,17d2,179a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pref-after.svg --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1786,17d2,179a
svg_stack.py --direction=h khmer-pref-before.svg right-arrow.svg khmer-pref-after.svg > khmer-pref.svg
#### Duplicates for other subsections
cp khmer-pref.svg khmer-pref-1.svg
cluster_styles = [
## 3.1 `locl`
No examples found in Noto Khmer.
## 3.2 `ccmp`
No examples found in Noto Khmer.
## 3.3 `pref`
Same as pre-base-reordering Ro.
## 3.4 `blwf`
> Note: altered bottom-margin number to fit.
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,25cc,17d2,17af
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178c,17d2,17af
svg_stack.py --direction=h khmer-blwf-before.svg right-arrow.svg khmer-blwf-after.svg > khmer-blwf.svg
## 3.5 `abvf`
> Note: Noto Serif Khmer implements this as a `abvs` lookup.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-before.svg --features=-abvf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvf-after.svg --features=+abvf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17ca
svg_stack.py --direction=h khmer-abvf-before.svg right-arrow.svg khmer-abvf-after.svg > khmer-abvf.svg
## 3.6 `pstf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-pstf-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=25cc,17d2,179f
svg_stack.py --direction=h khmer-pstf-before.svg right-arrow.svg khmer-pstf-after.svg > khmer-pstf.svg
## 3.7 `cfar`
No examples found in Noto Khmer.
## 4 `pres`
> Note: Adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781
hb-view --font-size=110 --margin=2,16,32,16 --output-file=khmer-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17d2,179a,17d2,1781
svg_stack.py --direction=h khmer-pres-before.svg right-arrow.svg khmer-pres-after.svg > khmer-pres.svg
## 4 `blws`
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17d2,1780
svg_stack.py --direction=h khmer-blws-before.svg right-arrow.svg khmer-blws-after.svg > khmer-blws.svg
## 4 `abvs`
hb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd
hb-view --font-size=110 --margin=32,16,2,16 --output-file=khmer-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1796,17b7,17cd
svg_stack.py --direction=h khmer-abvs-before.svg right-arrow.svg khmer-abvs-after.svg > khmer-abvs.svg
## 4 `psts`
> Note: Adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1787,17d2,1785,17d2,1788
## 4 `clig`
> Note: Noto Serif Khmer implements this twice, in clig and liga.
>
> Note: It is no longer possible to deactivate clig in HarfBuzz. See
> the issue at https://github.com/harfbuzz/harfbuzz/issues/1310 for
> more information.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-before.svg --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-clig-after.svg --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1780,17b6
svg_stack.py --direction=h khmer-clig-before.svg right-arrow.svg khmer-clig-after.svg > khmer-clig.svg
## 4 `liga`
> Note: because Noto Serif Khmer duplicates all of its liga
> substitutions in clig, which cannot be disabled in HarfBuzz (see the
> preceding section about clig), it is not possible to disable the
> liga substitutions either.
hb-view --font-size=110 --margin=2,16,8,24 --output-file=khmer-liga-before.svg --features=-clig,-liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5
hb-view --font-size=110 --margin=2,16,8,24 --output-file=khmer-liga-after.svg --features=+clig,+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179c,17d2,1788,17c5
svg_stack.py --direction=h khmer-liga-before.svg right-arrow.svg khmer-liga-after.svg > khmer-liga.svg
## 5 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=179e,17d2,1798,179a,17bc
svg_stack.py --direction=h khmer-dist-before.svg right-arrow.svg khmer-dist-after.svg > khmer-dist.svg
## 5 `kern`
No examples found in Noto Khmer....
## 5 `blwm`
> Note: adjusted bottom margin.
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc
hb-view --font-size=110 --margin=2,16,48,16 --output-file=khmer-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=1789,17bc
svg_stack.py --direction=h khmer-blwm-before.svg right-arrow.svg khmer-blwm-after.svg > khmer-blwm.svg
## 5 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=khmer-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifKhmer-Regular.ttf --unicodes=178e,17b7
svg_stack.py --direction=h khmer-abvm-before.svg right-arrow.svg khmer-abvm-after.svg > khmer-abvm.svg
## 5 `mkmk`
No examples found in Noto Khmer.
================================================
FILE: images/malayalam/malayalam-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-malayalam.md](../../opentype-shaping-malayalam.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d46,25cc,0d57
montage malayalam-matra-decompose-before.png right-arrow.png malayalam-matra-decompose-after.png -geometry +0+0 -background transparent malayalam-matra-decompose.png
## 2.7 post-base consonants
### Ya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f
montage malayalam-pstf-ya-before.png right-arrow.png malayalam-pstf-ya-after.png -geometry +0+0 -background transparent malayalam-pstf-ya.png
### Va
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35
montage malayalam-pstf-va-before.png right-arrow.png malayalam-pstf-va-after.png -geometry +0+0 -background transparent malayalam-pstf-va.png
## 3.2 `nukt`
> Note: Noto Serif Malayalam uses `U+0323` "Combining dot below" in
> its mark-placement lookups, not a Nukta (which does not exist in the
> Malayalam Unicode block).
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,25cc,0323
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,0323
montage malayalam-nukt-before.png right-arrow.png malayalam-nukt-after.png -geometry +0+0 -background transparent malayalam-nukt.png
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37
montage malayalam-akhn-kssa-before.png right-arrow.png malayalam-akhn-kssa-after.png -geometry +0+0 -background transparent malayalam-akhn-kssa.png
### NnTta
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f
montage malayalam-akhn-nntta-before.png right-arrow.png malayalam-akhn-nntta-after.png -geometry +0+0 -background transparent malayalam-akhn-nntta.png
> Note: The "Chillu R" is shown here because it may be implemented as
> an akhand form and that makes Malayalam distinct from several other
> Indic scripts.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d7c
montage malayalam-akhn-chillu-r-before.png right-arrow.png malayalam-akhn-chillu-r-after.png -geometry +0+0 -background transparent malayalam-akhn-chillu-r.png
## 3.4 `rphf`
> Note: Malayalam modern orthography does not use Reph. The dot-reph
> substitution here is shown with an accompanying note to that effect,
> and is accompanied by the Chillu-R image.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e
montage malayalam-dot-reph-before.png right-arrow.png malayalam-dot-reph-after.png -geometry +0+0 -background transparent malayalam-dot-reph.png
## 3.6 `pref`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-before.png --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-after.png --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30
montage malayalam-pstf-ra-before.png right-arrow.png malayalam-pstf-ra-after.png -geometry +0+0 -background transparent malayalam-pstf-ra.png
## 3.7 `blwf`
> Note: Noto Serif Malayalam includes a `blwf`-form "La" but does not
> include a feature that accesses it. It is included in several `akhn`
> ligatures, though. Instead, use SMC Rachana font.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32
montage malayalam-blwf-before.png right-arrow.png malayalam-blwf-after.png -geometry +0+0 -background transparent malayalam-blwf.png
## 3.9 `half`
> Note: Added a note to the shaping text about using `half` for Chillu
> lookups.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-before.png --features=+half --background=FFFFFF00 --preserve-default-ignorables /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,2005,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-after.png --features=+half --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,200d
montage malayalam-half-before.png right-arrow.png malayalam-half-after.png -geometry +0+0 -background transparent malayalam-half.png
## 3.10 `pstf`
> Note: Uses the same images as 2.7
## 3.12 `cjct`
> Note: Noto Serif Malayalam implements this as an `akhn` feature.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31
montage malayalam-cjct-before.png right-arrow.png malayalam-cjct-after.png -geometry +0+0 -background transparent malayalam-cjct.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-before.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d47,0d2c,0d4d,0d1e,0d4d,0d1c,0d3e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d2c,0d4d,0d1e,0d4d,0d1c,0d4b
montage malayalam-matra-position-before.png right-arrow.png malayalam-matra-position-after.png -geometry +0+0 -background transparent malayalam-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-before.png --features=+akhn,-abvm,-mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,200d,0d23,0d4d,200d,0d21,0d41
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-after.png --features=+akhn,+abvm,+mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,0d23,0d4d,200d,0d21,0d41
montage malayalam-repha-position-before.png right-arrow.png malayalam-repha-position-after.png -geometry +0+0 -background transparent malayalam-repha-position.png
## 4.4 Pre-base reordering
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-before.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=25cc,0d4d,0d30,0d39,0d4d,0d23,0d4d,0d21,0d4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d39,0d4d,0d23,0d4d,0d21,0d4d,0d30,0d4c
montage malayalam-pref-position-before.png right-arrow.png malayalam-pref-position-after.png -geometry +0+0 -background transparent malayalam-pref-position.png
## 5 `blws`
> Note: Noto Serif and Sans Malayalam have blws-like "La" features in
> other lookups, such as `akhn`. I have not been able to isolate one
> of them for usage.
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-before.png --features=-psts,-akhn --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-after.png --features=+psts,+akhn --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35
montage malayalam-psts-before.png right-arrow.png malayalam-psts-after.png -geometry +0+0 -background transparent malayalam-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-before.png --features=-haln --background=FFFFFF00 --preserve-default-ignorables /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,2005,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-after.png --features=+haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Malayalam/static/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,200d
montage malayalam-haln-before.png right-arrow.png malayalam-haln-after.png -geometry +0+0 -background transparent malayalam-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,32,2,16 --output-file=malayalam-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01
montage malayalam-abvm-before.png right-arrow.png malayalam-abvm-after.png -geometry +0+0 -background transparent malayalam-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62
montage malayalam-blwm-before.png right-arrow.png malayalam-blwm-after.png -geometry +0+0 -background transparent malayalam-blwm.png
================================================
FILE: images/malayalam/malayalam-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-malayalam.md](../../opentype-shaping-malayalam.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d46,25cc,0d57
svg_stack.py --direction=h malayalam-matra-decompose-before.svg right-arrow.svg malayalam-matra-decompose-after.svg > malayalam-matra-decompose.svg
## 2.7 post-base consonants
### Ya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ya-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d2f
svg_stack.py --direction=h malayalam-pstf-ya-before.svg right-arrow.svg malayalam-pstf-ya-after.svg > malayalam-pstf-ya.svg
### Va
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-va-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d35
svg_stack.py --direction=h malayalam-pstf-va-before.svg right-arrow.svg malayalam-pstf-va-after.svg > malayalam-pstf-va.svg
#### Duplicates for other subsections
cp malayalam-pstf-ya.svg malayalam-pstf-ya-1.svg
cluster_styles = [
cp malayalam-pstf-va.svg malayalam-pstf-va-1.svg
cluster_styles = [
## 3.2 `nukt`
> Note: Noto Serif Malayalam uses `U+0323` "Combining dot below" in
> its mark-placement lookups, not a Nukta (which does not exist in the
> Malayalam Unicode block).
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,25cc,0323
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-nukt-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d18,0323
svg_stack.py --direction=h malayalam-nukt-before.svg right-arrow.svg malayalam-nukt-after.svg > malayalam-nukt.svg
## 3.3 `akhn`
### KSsa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-kssa-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,0d37
svg_stack.py --direction=h malayalam-akhn-kssa-before.svg right-arrow.svg malayalam-akhn-kssa-after.svg > malayalam-akhn-kssa.svg
### NnTta
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-nntta-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d23,0d4d,0d1f
svg_stack.py --direction=h malayalam-akhn-nntta-before.svg right-arrow.svg malayalam-akhn-nntta-after.svg > malayalam-akhn-nntta.svg
> Note: The "Chillu R" is shown here because it may be implemented as
> an akhand form and that makes Malayalam distinct from several other
> Indic scripts.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-akhn-chillu-r-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d7c
svg_stack.py --direction=h malayalam-akhn-chillu-r-before.svg right-arrow.svg malayalam-akhn-chillu-r-after.svg > malayalam-akhn-chillu-r.svg
## 3.4 `rphf`
> Note: Malayalam modern orthography does not use Reph. The dot-reph
> substitution here is shown with an accompanying note to that effect,
> and is accompanied by the Chillu-R image.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d30,0d4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-dot-reph-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e
svg_stack.py --direction=h malayalam-dot-reph-before.svg right-arrow.svg malayalam-dot-reph-after.svg > malayalam-dot-reph.svg
## 3.6 `pref`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-before.svg --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSeriffiMalayalam-Regular.ttf --unicodes=0d4d,0d30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pstf-ra-after.svg --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4d,0d30
svg_stack.py --direction=h malayalam-pstf-ra-before.svg right-arrow.svg malayalam-pstf-ra-after.svg > malayalam-pstf-ra.svg
## 3.7 `blwf`
> Note: Noto Serif Malayalam includes a `blwf`-form "La" but does not
> include a feature that accesses it. It is included in several `akhn`
> ligatures, though. Instead, use SMC Rachana font.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/malayalam/Rachana-Regular.ttf --unicodes=0d4d,0d32
svg_stack.py --direction=h malayalam-blwf-before.svg right-arrow.svg malayalam-blwf-after.svg > malayalam-blwf.svg
#### Duplicates for other subsections
cp malayalam-blwf.svg malayalam-blwf-1.svg
cluster_styles = [
## 3.9 `half`
> Note: Added a note to the shaping text about using `half` for Chillu
> lookups.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-before.svg --features=+half --background=FFFFFF00 --preserve-default-ignorables /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,2005,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d15,0d4d,200d
svg_stack.py --direction=h malayalam-half-before.svg right-arrow.svg malayalam-half-after.svg > malayalam-half.svg
## 3.10 `pstf`
> Note: Uses the same images as 2.7
## 3.12 `cjct`
> Note: Noto Serif Malayalam implements this as an `akhn` feature.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-cjct-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d38,0d4d,0d31,0d4d,0d31
svg_stack.py --direction=h malayalam-cjct-before.svg right-arrow.svg malayalam-cjct-after.svg > malayalam-cjct.svg
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-before.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d47,0d2c,0d4d,0d1e,0d4d,0d1c,0d3e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-matra-position-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d2c,0d4d,0d1e,0d4d,0d1c,0d4b
svg_stack.py --direction=h malayalam-matra-position-before.svg right-arrow.svg malayalam-matra-position-after.svg > malayalam-matra-position.svg
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-before.svg --features=+akhn,-abvm,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,200d,0d23,0d4d,200d,0d21,0d41
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-repha-position-after.svg --features=+akhn,+abvm,+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d4e,0d23,0d4d,200d,0d21,0d41
svg_stack.py --direction=h malayalam-repha-position-before.svg right-arrow.svg malayalam-repha-position-after.svg > malayalam-repha-position.svg
## 4.4 Pre-base reordering
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-before.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=25cc,0d4d,0d30,0d39,0d4d,0d23,0d4d,0d21,0d4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-pref-position-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d39,0d4d,0d23,0d4d,0d21,0d4d,0d30,0d4c
svg_stack.py --direction=h malayalam-pref-position-before.svg right-arrow.svg malayalam-pref-position-after.svg > malayalam-pref-position.svg
## 5 `blws`
> Note: Noto Serif and Sans Malayalam have blws-like "La" features in
> other lookups, such as `akhn`. I have not been able to isolate one
> of them for usage.
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-before.svg --features=-psts,-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-psts-after.svg --features=+psts,+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d35,0d4d,0d35
svg_stack.py --direction=h malayalam-psts-before.svg right-arrow.svg malayalam-psts-after.svg > malayalam-psts.svg
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-before.svg --features=-haln --background=FFFFFF00 --preserve-default-ignorables /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,2005,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d33,0d4d,200d
svg_stack.py --direction=h malayalam-haln-before.svg right-arrow.svg malayalam-haln-after.svg > malayalam-haln.svg
## 6 `abvm`
hb-view --font-size=110 --margin=32,16,2,16 --output-file=malayalam-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01
hb-view --font-size=110 --margin=32,16,2,16 --output-file=malayalam-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d0a,0d01
svg_stack.py --direction=h malayalam-abvm-before.svg right-arrow.svg malayalam-abvm-after.svg > malayalam-abvm.svg
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62
hb-view --font-size=110 --margin=2,16,2,16 --output-file=malayalam-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf --unicodes=0d34,0d62
svg_stack.py --direction=h malayalam-blwm-before.svg right-arrow.svg malayalam-blwm-after.svg > malayalam-blwm.svg
================================================
FILE: images/mongolian/mongolian-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-mongolian.md](../../opentype-shaping-mongolian.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### FVS
#### No FVS
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180a,25cc
ontage mongolian-fvs-none-before.png right-arrow.png mongolian-fvs-none-after.png -geometry +0+0 -background transparent mongolian-fvs-none.png
#### FVS1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,180a,25cc
montage mongolian-fvs-fvs1-before.png right-arrow.png mongolian-fvs-fvs1-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs1.png
#### FVS2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,180a,25cc
montage mongolian-fvs-fvs2-before.png right-arrow.png mongolian-fvs-fvs2-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs2.png
#### FVS3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc
montage mongolian-fvs-fvs3-before.png right-arrow.png mongolian-fvs-fvs3-after.png -geometry +0+0 -background transparent mongolian-fvs-fvs3.png
## 4.2 `isol`
### `isol` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-before.png --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-after.png --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826
montage mongolian-isol-before.png right-arrow.png mongolian-isol-after.png -geometry +0+0 -background transparent mongolian-isol.png
### `isol` FVS
> Note: uses larger right margin
hb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-isol-fvs1-before.png --features=-isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-fvs1-after.png --features=+isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b
montage mongolian-isol-fvs1-before.png right-arrow.png mongolian-isol-fvs1-after.png -geometry +0+0 -background transparent mongolian-isol-fvs1.png
## 4.3 `fina`
### `fina` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830
montage mongolian-fina-before.png right-arrow.png mongolian-fina-after.png -geometry +0+0 -background transparent mongolian-fina.png
### `fina` FVS
> Note: uses larger right margin
hb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-fina-fvs2-before.png --features=-fina --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-fvs2-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c
montage mongolian-fina-fvs2-before.png right-arrow.png mongolian-fina-fvs2-after.png -geometry +0+0 -background transparent mongolian-fina-fvs2.png
## 4.6 `medi`
### `medi` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc
montage mongolian-medi-before.png right-arrow.png mongolian-medi-after.png -geometry +0+0 -background transparent mongolian-medi.png
### `medi` FVS
> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1
> (which is zero-width)
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-before.png --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-after.png --features=+medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,180a,25cc
montage mongolian-medi-fvs1-before.png right-arrow.png mongolian-medi-fvs1-after.png -geometry +0+0 -background transparent mongolian-medi-fvs1.png
## 4.8 `init`
### `init` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc
montage mongolian-init-before.png right-arrow.png mongolian-init-after.png -geometry +0+0 -background transparent mongolian-init.png
### `init` FVS
> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1
> (which is zero-width)
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-before.png --features=-init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-after.png --features=+init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,180a,25cc
montage mongolian-init-fvs1-before.png right-arrow.png mongolian-init-fvs1-after.png -geometry +0+0 -background transparent mongolian-init-fvs1.png
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820
montage mongolian-rlig-before.png right-arrow.png mongolian-rlig-after.png -geometry +0+0 -background transparent mongolian-rlig.png
================================================
FILE: images/mongolian/mongolian-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-mongolian.md](../../opentype-shaping-mongolian.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### FVS
#### No FVS
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-none-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180a,25cc
svg_stack --direction=h mongolian-fvs-none-before.svg right-arrow.svg mongolian-fvs-none-after.svg > mongolian-fvs-none.svg
#### FVS1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs1-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180b,180a,25cc
svg_stack --direction=h mongolian-fvs-fvs1-before.svg right-arrow.svg mongolian-fvs-fvs1-after.svg > mongolian-fvs-fvs1.svg
#### FVS2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs2-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180c,180a,25cc
svg_stack --direction=h mongolian-fvs-fvs2-before.svg right-arrow.svg mongolian-fvs-fvs2-after.svg > mongolian-fvs-fvs2.svg
#### FVS3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fvs-fvs3-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1873,180d,180a,25cc
svg_stack --direction=h mongolian-fvs-fvs3-before.svg right-arrow.svg mongolian-fvs-fvs3-after.svg > mongolian-fvs-fvs3.svg
## 4.2 `isol`
### `isol` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-before.svg --features=-isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-after.svg --features=+isol --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826
svg_stack --direction=h mongolian-isol-before.svg right-arrow.svg mongolian-isol-after.svg > mongolian-isol.svg
### `isol` FVS
> Note: uses larger right margin
hb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-isol-fvs1-before.svg --features=-isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-isol-fvs1-after.svg --features=+isol --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1826,180b
svg_stack --direction=h mongolian-isol-fvs1-before.svg right-arrow.svg mongolian-isol-fvs1-after.svg > mongolian-isol-fvs1.svg
## 4.3 `fina`
### `fina` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830
svg_stack --direction=h mongolian-fina-before.svg right-arrow.svg mongolian-fina-after.svg > mongolian-fina.svg
### `fina` FVS
> Note: uses larger right margin
hb-view --font-size=110 --margin=2,112,2,16 --output-file=mongolian-fina-fvs2-before.svg --features=-fina --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-fina-fvs2-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,1830,180c
svg_stack --direction=h mongolian-fina-fvs2-before.svg right-arrow.svg mongolian-fina-fvs2-after.svg > mongolian-fina-fvs2.svg
## 4.6 `medi`
### `medi` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180a,25cc
svg_stack --direction=h mongolian-medi-before.svg right-arrow.svg mongolian-medi-after.svg > mongolian-medi.svg
### `medi` FVS
> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1
> (which is zero-width)
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-before.svg --features=-medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-medi-fvs1-after.svg --features=+medi --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=25cc,180a,186f,180b,180a,25cc
svg_stack --direction=h mongolian-medi-fvs1-before.svg right-arrow.svg mongolian-medi-fvs1-after.svg > mongolian-medi-fvs1.svg
## 4.8 `init`
### `init` general
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180a,25cc
svg_stack --direction=h mongolian-init-before.svg right-arrow.svg mongolian-init-after.svg > mongolian-init.svg
### `init` FVS
> Note: uses ZWNJ and spaces to approximate correct spacing for FVS1
> (which is zero-width)
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-before.svg --features=-init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,200d,0020,0020,0020,202f,180a,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-init-fvs1-after.svg --features=+init --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=1821,180b,180a,25cc
svg_stack --direction=h mongolian-init-fvs1-before.svg right-arrow.svg mongolian-init-fvs1-after.svg > mongolian-init-fvs1.svg
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820
hb-view --font-size=110 --margin=2,16,2,16 --output-file=mongolian-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMongolian-Regular.ttf --unicodes=182a,1820
svg_stack --direction=h mongolian-rlig-before.svg right-arrow.svg mongolian-rlig-after.svg > mongolian-rlig.svg
================================================
FILE: images/myanmar/myanmar-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-myanmar.md](../../opentype-shaping-myanmar.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### Variation Selector
#### No VS
> Note: SIL Padauk 3.x implements the dotted-form feature, but not
> using Variation Selectors, for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1022
#### VS dotted form
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-after.png --features=+psts --preserve-default-ignorables --language=KHT --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1022,fe00
montage myanmar-dotted-before.png right-arrow.png myanmar-dotted-after.png -geometry +0+0 -background transparent myanmar-dotted.png
## 1. Kinzi
> Note: Noto Sans Myanmar does not implement the `rphf` feature for
> unknown reasons.
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,103a,1039,25cc
montage myanmar-kinzi-ra-before.png right-arrow.png myanmar-kinzi-ra-after.png -geometry +0+0 -background transparent myanmar-kinzi-ra.png
### Nga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,25cc
montage myanmar-kinzi-nga-before.png right-arrow.png myanmar-kinzi-nga-after.png -geometry +0+0 -background transparent myanmar-kinzi-nga.png
### Mon Nga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-before.png --features=-rphf,-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=105a,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=105a,103a,1039,25cc
montage myanmar-kinzi-monnga-before.png right-arrow.png myanmar-kinzi-monnga-after.png -geometry +0+0 -background transparent myanmar-kinzi-monnga.png
## 2.4 Medial Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1017,200D,103C
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-after.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1017,103C
montage myanmar-medial-ra-before.png right-arrow.png myanmar-medial-ra-after.png -geometry +0+0 -background transparent myanmar-medial-ra.png
## 3.1 locl
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-before.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,103d,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-after.png --features=+psts --language=KSW --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,103d,103e
montage myanmar-locl-before.png right-arrow.png myanmar-locl-after.png -geometry +0+0 -background transparent myanmar-locl.png
## 3.3 rphf
> Same as Kinzi
## 3.4 pref
> Note: Noto Sans Myanmar does not implement any pref features for
> unknown reasons. This example shows a basic-shaping feature to
> distinguish pref from the more stylistic applications of pres.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-before.png --features=-pref --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=103c,103f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-after.png --features=+pref --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=103f,103c
montage myanmar-pref-before.png right-arrow.png myanmar-pref-after.png -geometry +0+0 -background transparent myanmar-pref.png
## 3.5 blwf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-before.png --features=-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,1039,100a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-after.png --features=+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100f,1039,100a
montage myanmar-blwf-before.png right-arrow.png myanmar-blwf-after.png -geometry +0+0 -background transparent myanmar-blwf.png
## 3.6 pstf
> Note: Noto Sans Myanmar does not include a pstf feature for unknown
> reasons. This example shows an orthographically-selected variant, as
> referred to on
> https://r12a.github.io/scripts/myanmar/block#charTALL%20AA to
> distinguish pstf as an initial-shaping feature from the more
> stylistic applications of psts.
>
> Note: The example linked to above is used in the Microsoft
> script-development spec for Myanmar:
> https://docs.microsoft.com/en-us/typography/script-development/myanmar#feature-tag-pstf
> but this usage is not well-attested in real-world Myanmar
> fonts. Instead, the "Aa"/"Tall Aa" distinction is made at the
> encoding level and is expected to happen during text
> entry. Consequently, this image has been removed from the
> script-specific shaping document. See issue #85 for the discussion.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-before.png --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101d,102c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-after.png --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101d,102b
montage myanmar-pstf-before.png right-arrow.png myanmar-pstf-after.png -geometry +0+0 -background transparent myanmar-pstf.png
## 4 pres
> Note: Noto Sans Myanmar does not implement this as a pres feature.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-before.png --features=-pres,-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100c,103c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-after.png --features=+pres,+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100c,103c
montage myanmar-pres-before.png right-arrow.png myanmar-pres-after.png -geometry +0+0 -background transparent myanmar-pres.png
## 4 abvs
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-before.png --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,102d,1032
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-after.png --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,102d,1032
montage myanmar-abvs-before.png right-arrow.png myanmar-abvs-after.png -geometry +0+0 -background transparent myanmar-abvs.png
## 4 blws
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-before.png --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=aa6b,103c,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-after.png --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=aa6b,103c,103e
montage myanmar-blws-before.png right-arrow.png myanmar-blws-after.png -geometry +0+0 -background transparent myanmar-blws.png
## 4 psts
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-before.png --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,103b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-after.png --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=100b,103b
montage myanmar-psts-before.png right-arrow.png myanmar-psts-after.png -geometry +0+0 -background transparent myanmar-psts.png
## 4 liga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-before.png --features=-liga,-blws,-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-after.png --features=+liga --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e
montage myanmar-liga-before.png right-arrow.png myanmar-liga-after.png -geometry +0+0 -background transparent myanmar-liga.png
## 5 dist
> Note: Noto Sans Myanmar implements all distance adjustments in
> `kern`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-before.png --features=-kern --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-after.png --features=+kern --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c
montage myanmar-dist-before.png right-arrow.png myanmar-dist-after.png -geometry +0+0 -background transparent myanmar-dist.png
## 5 abvm
> Note: Noto Sans Myanmar implements this as `mark`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-before.png --features=-mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-after.png --features=+mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008
montage myanmar-abvm-before.png right-arrow.png myanmar-abvm-after.png -geometry +0+0 -background transparent myanmar-abvm.png
## 5 blwm
> Note: Noto Sans Myanmar implements this as `mark`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-before.png --features=-mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1009,1039,101b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-after.png --features=+mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1009,1039,101b
montage myanmar-blwm-before.png right-arrow.png myanmar-blwm-after.png -geometry +0+0 -background transparent myanmar-blwm.png
## 5 mark
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-before.png --features=-mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=107e,108d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-after.png --features=+mark --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=107e,108d
montage myanmar-mark-before.png right-arrow.png myanmar-mark-after.png -geometry +0+0 -background transparent myanmar-mark.png
## 5 mkmk
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-before.png --features=-mkmk --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-after.png --features=+mkmk --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/Noto_Serif_Myanmar/NotoSerifMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e
montage myanmar-mkmk-before.png right-arrow.png myanmar-mkmk-after.png -geometry +0+0 -background transparent myanmar-mkmk.png
================================================
FILE: images/myanmar/myanmar-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-myanmar.md](../../opentype-shaping-myanmar.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Terminology
### Variation Selector
#### No VS
> Note: SIL Padauk 3.x implements the dotted-form feature, but not
> using Variation Selectors, for unknown reasons.
fi
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-before.svg --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/padauk-3.003/PadaukBook-Regular.ttf --unicodes=1022
#### VS dotted form
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dotted-after.svg --features=+psts --preserve-default-ignorables --language=KHT --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/padauk-3.003/PadaukBook-Regular.ttf --unicodes=1022,fe00
svg_stack --direction=h myanmar-dotted-before.svg right-arrow.svg myanmar-dotted-after.svg > myanmar-dotted.svg
## 1. Kinzi
> Note: Noto Sans Myanmar does not implement the `rphf` feature for
> unknown reasons.
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-ra-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,103a,1039,25cc
svg_stack --direction=h myanmar-kinzi-ra-before.svg right-arrow.svg myanmar-kinzi-ra-after.svg > myanmar-kinzi-ra.svg
### Nga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-nga-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,25cc
svg_stack --direction=h myanmar-kinzi-nga-before.svg right-arrow.svg myanmar-kinzi-nga-after.svg > myanmar-kinzi-nga.svg
### Mon Nga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-before.svg --features=-rphf,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=105a,200c,103a,1039,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-kinzi-monnga-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=105a,103a,1039,25cc
svg_stack --direction=h myanmar-kinzi-monnga-before.svg right-arrow.svg myanmar-kinzi-monnga-after.svg > myanmar-kinzi-monnga.svg
## 2.4 Medial Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-before.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1017,200D,103C
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-medial-ra-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1017,103C
svg_stack --direction=h myanmar-medial-ra-before.svg right-arrow.svg myanmar-medial-ra-after.svg > myanmar-medial-ra.svg
## 3.1 locl
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-before.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,103d,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-locl-after.svg --features=+psts --language=KSW --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,103d,103e
svg_stack --direction=h myanmar-locl-before.svg right-arrow.svg myanmar-locl-after.svg > myanmar-locl.svg
## 3.3 rphf
> Same as Kinzi
## 3.4 pref
> Note: Noto Sans Myanmar does not implement any pref features for
> unknown reasons. This example shows a basic-shaping feature to
> distinguish pref from the more stylistic applications of pres.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-before.svg --features=-pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=103c,103f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pref-after.svg --features=+pref --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=103f,103c
svg_stack --direction=h myanmar-pref-before.svg right-arrow.svg myanmar-pref-after.svg > myanmar-pref.svg
## 3.5 blwf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,1039,100a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100f,1039,100a
svg_stack --direction=h myanmar-blwf-before.svg right-arrow.svg myanmar-blwf-after.svg > myanmar-blwf.svg
## 3.6 pstf
> Note: Noto Sans Myanmar does not include a pstf feature for unknown
> reasons. This example shows an orthographically-selected variant, as
> referred to on
> https://r12a.github.io/scripts/myanmar/block#charTALL%20AA to
> distinguish pstf as an initial-shaping feature from the more
> stylistic applications of psts.
>
> Note: The example linked to above is used in the Microsoft
> script-development spec for Myanmar:
> https://docs.microsoft.com/en-us/typography/script-development/myanmar#feature-tag-pstf
> but this usage is not well-attested in real-world Myanmar
> fonts. Instead, the "Aa"/"Tall Aa" distinction is made at the
> encoding level and is expected to happen during text
> entry. Consequently, this image has been removed from the
> script-specific shaping document. See issue #85 for the discussion.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101d,102c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pstf-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101d,102b
svg_stack --direction=h myanmar-pstf-before.svg right-arrow.svg myanmar-pstf-after.svg > myanmar-pstf.svg
## 4 pres
> Note: Noto Sans Myanmar does not implement this as a pres feature.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-before.svg --features=-pres,-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100c,103c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-pres-after.svg --features=+pres,+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100c,103c
svg_stack --direction=h myanmar-pres-before.svg right-arrow.svg myanmar-pres-after.svg > myanmar-pres.svg
## 4 abvs
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,102d,1032
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,102d,1032
svg_stack --direction=h myanmar-abvs-before.svg right-arrow.svg myanmar-abvs-after.svg > myanmar-abvs.svg
## 4 blws
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=aa6b,103c,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=aa6b,103c,103e
svg_stack --direction=h myanmar-blws-before.svg right-arrow.svg myanmar-blws-after.svg > myanmar-blws.svg
## 4 psts
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,103b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-psts-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=100b,103b
svg_stack --direction=h myanmar-psts-before.svg right-arrow.svg myanmar-psts-after.svg > myanmar-psts.svg
## 4 liga
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-before.svg --features=-liga,-blws,-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-liga-after.svg --features=+liga --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1016,103c,103d,103e
svg_stack --direction=h myanmar-liga-before.svg right-arrow.svg myanmar-liga-after.svg > myanmar-liga.svg
## 5 dist
> Note: Noto Sans Myanmar implements all distance adjustments in
> `kern`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-dist-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=101b,102b,103a,100f,103c
svg_stack --direction=h myanmar-dist-before.svg right-arrow.svg myanmar-dist-after.svg > myanmar-dist.svg
## 5 abvm
> Note: Noto Sans Myanmar implements this as `mark`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-abvm-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1004,103a,1039,1008
svg_stack --direction=h myanmar-abvm-before.svg right-arrow.svg myanmar-abvm-after.svg > myanmar-abvm.svg
## 5 blwm
> Note: Noto Sans Myanmar implements this as `mark`.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1009,1039,101b
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-blwm-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1009,1039,101b
svg_stack --direction=h myanmar-blwm-before.svg right-arrow.svg myanmar-blwm-after.svg > myanmar-blwm.svg
## 5 mark
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=107e,108d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=107e,108d
svg_stack --direction=h myanmar-mark-before.svg right-arrow.svg myanmar-mark-after.svg > myanmar-mark.svg
## 5 mkmk
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-before.svg --features=-mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=myanmar-mkmk-after.svg --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansMyanmar-Regular.ttf --unicodes=1000,1039,105d,105e
svg_stack --direction=h myanmar-mkmk-before.svg right-arrow.svg myanmar-mkmk-after.svg > myanmar-mkmk.svg
================================================
FILE: images/nko/nko-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-nko.md](../../opentype-shaping-nko.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 4.3 `fina`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5
montage nko-fina-before.png right-arrow.png nko-fina-after.png -geometry +0+0 -background transparent nko-fina.png
## 4.6 `medi`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa
b-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa
montage nko-medi-before.png right-arrow.png nko-medi-after.png -geometry +0+0 -background transparent nko-medi.png
## 4.8 `init`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa
montage nko-init-before.png right-arrow.png nko-init-after.png -geometry +0+0 -background transparent nko-init.png
## 7.3 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1
montage nko-mark-before.png right-arrow.png nko-mark-after.png -geometry +0+0 -background transparent nko-mark.png
================================================
FILE: images/nko/nko-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-nko.md](../../opentype-shaping-nko.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 4.3 `fina`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e5
svg_stack --direction=h nko-fina-before.svg right-arrow.svg nko-fina-after.svg > nko-fina.svg
## 4.6 `medi`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07e8,07fa
svg_stack --direction=h nko-medi-before.svg right-arrow.svg nko-medi-after.svg > nko-medi.svg
## 4.8 `init`
> Note: Noto Sans NKo does not have a dotted-circle glyph. These
> images use `U+07fa`, the lajanyalan (N'Ko kashida) in its place.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07da,07fa
svg_stack --direction=h nko-init-before.svg right-arrow.svg nko-init-after.svg > nko-init.svg
## 7.3 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=nko-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansNKo-Regular.ttf --unicodes=07fa,07d5,07f1
svg_stack --direction=h nko-mark-before.svg right-arrow.svg nko-mark-after.svg > nko-mark.svg
================================================
FILE: images/oriya/oriya-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-oriya.md](../../opentype-shaping-oriya.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,25cc,0b56
montage oriya-matra-decompose-before.png right-arrow.png oriya-matra-decompose-after.png -geometry +0+0 -background transparent oriya-matra-decompose.png
## 2.7 Post-base consonants
### Ya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-before.png --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-after.png --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f
montage oriya-pstf-ya-before.png right-arrow.png oriya-pstf-ya-after.png -geometry +0+0 -background transparent oriya-pstf-ya.png
### Yya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-before.png --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-after.png --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f
montage oriya-pstf-yya-before.png right-arrow.png oriya-pstf-yya-after.png -geometry +0+0 -background transparent oriya-pstf-yya.png
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-before.png --features=-nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,25cc,0b3c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-after.png --features=+nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,0b3c
montage oriya-nukt-before.png right-arrow.png oriya-nukt-after.png -geometry +0+0 -background transparent oriya-nukt.png
## 3.3 `akhn`
### KSsa
> Note: Noto Sans Oriya implements this in a `pres`+`blwf` combination
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-before.png --features=-pres,-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-after.png --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37
montage oriya-akhn-kssa-before.png right-arrow.png oriya-akhn-kssa-after.png -geometry +0+0 -background transparent oriya-akhn-kssa.png
### JNya
> Note: Noto Sans Oriya implements this in a `blwf`+`cjct` combination
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-before.png --features=-pres,-cjct,-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-after.png --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e
montage oriya-akhn-jnya-before.png right-arrow.png oriya-akhn-jnya-after.png -geometry +0+0 -background transparent oriya-akhn-jnya.png
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-before.png --features=-rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-after.png --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc
montage oriya-rphf-before.png right-arrow.png oriya-rphf-after.png -geometry +0+0 -background transparent oriya-rphf.png
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-before.png --features=-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-after.png --features=+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25
montage oriya-blwf-before.png right-arrow.png oriya-blwf-after.png -geometry +0+0 -background transparent oriya-blwf.png
## 3.9 `half`
> No examples found.
## 3.10 `pstf`
> Same as 2.7
## 3.12 `cjct`
> Not a perfect example....
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-before.png --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0b4d,25cc,0b4d,0b2a,0b4d,0b5d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-after.png --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0b4d,0b2a,0b4d,0b5d
montage oriya-cjct-before.png right-arrow.png oriya-cjct-after.png -geometry +0+0 -background transparent oriya-cjct.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,0b28,0b4d,200d,0b2d,0b4d,0b27,0b57
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,200d,0b2d,0b4d,0b27,0b4c
montage oriya-matra-position-before.png right-arrow.png oriya-matra-position-after.png -geometry +0+0 -background transparent oriya-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-before.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc,0b2a,0b4d,0b2a,0b4d,0b26,0b4d,0b2f,0b3e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-after.png --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,0b2a,0b4d,0b2a,0b4d,0b26,0b4d,0b2f,0b3e
montage oriya-reph-position-before.png right-arrow.png oriya-reph-position-after.png -geometry +0+0 -background transparent oriya-reph-position.png
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-before.png --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-after.png --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d
montage oriya-pres-before.png right-arrow.png oriya-pres-after.png -geometry +0+0 -background transparent oriya-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-before.png --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-after.png --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01
montage oriya-abvs-before.png right-arrow.png oriya-abvs-after.png -geometry +0+0 -background transparent oriya-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-before.png --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-after.png --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42
montage oriya-blws-before.png right-arrow.png oriya-blws-after.png -geometry +0+0 -background transparent oriya-blws.png
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-before.png --features=-psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-after.png --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01
montage oriya-psts-before.png right-arrow.png oriya-psts-after.png -geometry +0+0 -background transparent oriya-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-before.png --features=-haln,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-after.png --features=+haln,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d
montage oriya-haln-before.png right-arrow.png oriya-haln-after.png -geometry +0+0 -background transparent oriya-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-before.png --features=-abvm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b4d,0b18,0b48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-after.png --features=+abvm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b4d,0b18,0b48
montage oriya-abvm-before.png right-arrow.png oriya-abvm-after.png -geometry +0+0 -background transparent oriya-abvm.png
## 6 `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-before.png --features=-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-after.png --features=+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44
montage oriya-blwm-before.png right-arrow.png oriya-blwm-after.png -geometry +0+0 -background transparent oriya-blwm.png
================================================
FILE: images/oriya/oriya-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-oriya.md](../../opentype-shaping-oriya.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-decompose-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,25cc,0b57
svg_stack.py --direction=h oriya-matra-decompose-before.svg right-arrow.svg oriya-matra-decompose-after.svg > oriya-matra-decompose.svg
## 2.7 Below-base consonants
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-ra-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-ra-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b30
svg_stack.py --direction=h oriya-blwf-ra-before.svg right-arrow.svg oriya-blwf-ra-after.svg > oriya-blwf-ra.svg
#### Duplicates for other subsections
cp oriya-blwf-ra.svg oriya-blwf-ra-1.svg
cluster_styles = [
cp oriya-blwf-ra.svg oriya-blwf-ra-2.svg
cluster_styles = [
## 2.7 Post-base consonants
### Ya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-ya-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b2f
svg_stack.py --direction=h oriya-pstf-ya-before.svg right-arrow.svg oriya-pstf-ya-after.svg > oriya-pstf-ya.svg
### Yya
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-before.svg --features=-pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pstf-yya-after.svg --features=+pstf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b5f
svg_stack.py --direction=h oriya-pstf-yya-before.svg right-arrow.svg oriya-pstf-yya-after.svg > oriya-pstf-yya.svg
#### Duplicates for other subsections
cp oriya-pstf-ya.svg oriya-pstf-ya-1.svg
cluster_styles = [
cp oriya-pstf-yya.svg oriya-pstf-yya-1.svg
cluster_styles = [
## 3.2 `nukt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-before.svg --features=-nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,25cc,0b3c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-nukt-after.svg --features=+nukt --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b16,0b3c
svg_stack.py --direction=h oriya-nukt-before.svg right-arrow.svg oriya-nukt-after.svg > oriya-nukt.svg
## 3.3 `akhn`
### KSsa
> Note: Noto Sans Oriya implements this in a `pres`+`blwf` combination
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-before.svg --features=-pres,-blwf,-akhn,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-kssa-after.svg --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b15,0b4d,0b37
svg_stack.py --direction=h oriya-akhn-kssa-before.svg right-arrow.svg oriya-akhn-kssa-after.svg > oriya-akhn-kssa.svg
### JNya
> Note: Noto Sans Oriya implements this in a `blwf`+`cjct` combination
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-before.svg --features=-pres,-cjct,-blwf,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-akhn-jnya-after.svg --features=+pres,+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1c,0b4d,0b1e
svg_stack.py --direction=h oriya-akhn-jnya-before.svg right-arrow.svg oriya-akhn-jnya-after.svg > oriya-akhn-jnya.svg
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-before.svg --features=-rphf,-haln --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-rphf-after.svg --features=+rphf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc
svg_stack.py --direction=h oriya-rphf-before.svg right-arrow.svg oriya-rphf-after.svg > oriya-rphf.svg
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-before.svg --features=-blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwf-after.svg --features=+blwf --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=25cc,0b4d,0b25
svg_stack.py --direction=h oriya-blwf-before.svg right-arrow.svg oriya-blwf-after.svg > oriya-blwf.svg
## 3.9 `half`
> No examples found.
## 3.10 `pstf`
> Same as 2.7
## 3.12 `cjct`
> Not a perfect example....
> Noto Serif Oriya implements this in a combination of multiple
> features, including akhn and blwf. It also applies haln, which must
> be deactivated in this illustration because it is documented as
> being applied later.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-before.svg --features=-blwf,-akhn,-cjct --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0bd4,0b2a,0b40
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-cjct-after.svg --features=+cjct --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b38,0bd4,0b2a,0b40
svg_stack.py --direction=h oriya-cjct-before.svg right-arrow.svg oriya-cjct-after.svg > oriya-cjct.svg
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b47,0b36,0b4d,0b24,0b4d,0b30,0b4d,0b2f,0b56
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-matra-position-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b36,0b4d,0b24,0b4d,0b30,0b4d,0b2f,0b48
svg_stack.py --direction=h oriya-matra-position-before.svg right-arrow.svg oriya-matra-position-after.svg > oriya-matra-position.svg
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-before.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,25cc,0b2a,0b4d,0b27,0b4d,0b30,0b4d,0b2f,0b3e,0b41
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-reph-position-after.svg --features= --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b30,0b4d,0b2a,0b4d,0b27,0b4d,0b30,0b4d,0b2f,0b3e,0b41
svg_stack.py --direction=h oriya-reph-position-before.svg right-arrow.svg oriya-reph-position-after.svg > oriya-reph-position.svg
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-before.svg --features=-pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-pres-after.svg --features=+pres --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2d
svg_stack.py --direction=h oriya-pres-before.svg right-arrow.svg oriya-pres-after.svg > oriya-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-before.svg --features=-abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvs-after.svg --features=+abvs --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b13,200d,0b01
svg_stack.py --direction=h oriya-abvs-before.svg right-arrow.svg oriya-abvs-after.svg > oriya-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-before.svg --features=-blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blws-after.svg --features=+blws --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b28,0b4d,0b24,0b42
svg_stack.py --direction=h oriya-blws-before.svg right-arrow.svg oriya-blws-after.svg > oriya-blws.svg
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-before.svg --features=-psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-psts-after.svg --features=+psts --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b23,0b4c,0b01
svg_stack.py --direction=h oriya-psts-before.svg right-arrow.svg oriya-psts-after.svg > oriya-psts.svg
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-before.svg --features=-haln,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-haln-after.svg --features=+haln,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b1d,0b4d
svg_stack.py --direction=h oriya-haln-before.svg right-arrow.svg oriya-haln-after.svg > oriya-haln.svg
## 6 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-dist-before.svg --features=-dist --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b42,0b15
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-dist-after.svg --features=+dist --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b42,0b15
svg_stack.py --direction=h oriya-dist-before.svg right-arrow.svg oriya-dist-after.svg > oriya-dist.svg
## 6 `abvm`
> Note: Noto Serif Oriya implements this as `blwm` for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-before.svg --features=-abvm,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-abvm-after.svg --features=+abvm,+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b19,0b48
svg_stack.py --direction=h oriya-abvm-before.svg right-arrow.svg oriya-abvm-after.svg > oriya-abvm.svg
## 6 `blwm`
> Note: Noto Serif Oriya implements this as `abvm` for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-before.svg --features=-abvm,-blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44
hb-view --font-size=110 --margin=2,16,2,16 --output-file=oriya-blwm-after.svg --features=+blwm --background=FFFFFF00 /home/nate/SyncThing/fonts-external/temporary-and-testing/NotoSerifOriya-Regular.ttf --unicodes=0b2e,0b4d,0b2b,0b44
svg_stack.py --direction=h oriya-blwm-before.svg right-arrow.svg oriya-blwm-after.svg > oriya-blwm.svg
================================================
FILE: images/sinhala/sinhala-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-sinhala.md](../../opentype-shaping-sinhala.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dda
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,25cc,0dca
montage sinhala-matra-decompose-before.png right-arrow.png sinhala-matra-decompose-after.png -geometry +0+0 -background transparent sinhala-matra-decompose.png
## 2.7 Post-base consonants
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-before.png --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-after.png --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba
montage sinhala-vatu-va-before.png right-arrow.png sinhala-vatu-va-after.png -geometry +0+0 -background transparent sinhala-vatu-va.png
## 3.3 `akhn`
### Ligature
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-before.png --features=-akhn --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,25cc,0dca,200d,0dc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,0dca,200d,0dc2
montage sinhala-akhn-ligature-before.png right-arrow.png sinhala-akhn-ligature-after.png -geometry +0+0 -background transparent sinhala-akhn-ligature.png
### Touching
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-before.png --features=-akhn,-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,25cc,0dca,0d9d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-after.png --features=+akhn,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,0dca,0d9d
montage sinhala-akhn-touching-before.png right-arrow.png sinhala-akhn-touching-after.png -geometry +0+0 -background transparent sinhala-akhn-touching.png
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-rphf-before.png --features=-rphf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc
hb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-rphf-after.png --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc
montage sinhala-rphf-before.png right-arrow.png sinhala-rphf-after.png -geometry +0+0 -background transparent sinhala-rphf.png
## 3.10 `pstf`
> Not needed?
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-before.png --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dde
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-after.png --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0ddf
montage sinhala-pstf-before.png right-arrow.png sinhala-pstf-after.png -geometry +0+0 -background transparent sinhala-pstf.png
## 3.11 `vatu`
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-before.png --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-after.png --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb
montage sinhala-vatu-ra-before.png right-arrow.png sinhala-vatu-ra-after.png -geometry +0+0 -background transparent sinhala-vatu-ra.png
### Va
> Same as 2.7
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0dcf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0ddc
montage sinhala-matra-position-before.png right-arrow.png sinhala-matra-position-after.png -geometry +0+0 -background transparent sinhala-matra-position.png
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-reph-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-reph-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1
montage sinhala-reph-position-before.png right-arrow.png sinhala-reph-position-after.png -geometry +0+0 -background transparent sinhala-reph-position.png
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-before.png --features=-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,25cc,0dca,0da2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,0dca,0da2
montage sinhala-pres-before.png right-arrow.png sinhala-pres-after.png -geometry +0+0 -background transparent sinhala-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3
montage sinhala-abvs-before.png right-arrow.png sinhala-abvs-after.png -geometry +0+0 -background transparent sinhala-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6
montage sinhala-blws-before.png right-arrow.png sinhala-blws-after.png -geometry +0+0 -background transparent sinhala-blws.png
## 5 `psts`
> Note: this lookup only works in Noto Sans. Needs more investigation.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSinhala-Regular.ttf --unicodes=0daf,0dca,200d,0dba,0ddd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSinhala-Regular.ttf --unicodes=0daf,0dca,200d,0dba,0ddd
montage sinhala-psts-before.png right-arrow.png sinhala-psts-after.png -geometry +0+0 -background transparent sinhala-psts.png
## 6 `abvm`
> Note: Noto Sans Sinhala implements this as an `abvs` substitution
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dae
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dae
montage sinhala-abvm-before.png right-arrow.png sinhala-abvm-after.png -geometry +0+0 -background transparent sinhala-abvm.png
## 6 `blwm`
> Note: Noto Sans Sinhala double-implements this in both `blwm` and
> `abvs`, even though it is clearly not above-base.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-before.png --features=-blwm,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9f,0dca,200d,0dbb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-after.png --features=+blwm,+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9f,0dca,200d,0dbb
montage sinhala-blwm-before.png right-arrow.png sinhala-blwm-after.png -geometry +0+0 -background transparent sinhala-blwm.png
================================================
FILE: images/sinhala/sinhala-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-sinhala.md](../../opentype-shaping-sinhala.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dda
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,25cc,0dca
svg_stack.py --direction=h sinhala-matra-decompose-before.svg right-arrow.svg sinhala-matra-decompose-after.svg > sinhala-matra-decompose.svg
## 2.7 Post-base consonants
### Ra
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-before.svg --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-ra-after.svg --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dbb
svg_stack.py --direction=h sinhala-vatu-ra-before.svg right-arrow.svg sinhala-vatu-ra-after.svg > sinhala-vatu-ra.svg
### Va
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-before.svg --features=-vatu --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-vatu-va-after.svg --features=+vatu --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dca,200d,0dba
svg_stack.py --direction=h sinhala-vatu-va-before.svg right-arrow.svg
sinhala-vatu-va-after.svg > sinhala-vatu-va.svg
#### Duplicates for other subsections
cp sinhala-vatu-ra.svg sinhala-vatu-ra-1.svg
cluster_styles = [
cp sinhala-vatu-va.svg sinhala-vatu-va-1.svg
cluster_styles = [
## 3.3 `akhn`
### Ligature
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-before.svg --features=-akhn --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,25cc,0dca,200d,0dc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-ligature-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9a,0dca,200d,0dc2
svg_stack.py --direction=h sinhala-akhn-ligature-before.svg right-arrow.svg sinhala-akhn-ligature-after.svg > sinhala-akhn-ligature.svg
### Touching
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-before.svg --features=-akhn,-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,25cc,0dca,0d9d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-akhn-touching-after.svg --features=+akhn,+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9c,200d,0dca,0d9d
svg_stack.py --direction=h sinhala-akhn-touching-before.svg right-arrow.svg sinhala-akhn-touching-after.svg > sinhala-akhn-touching.svg
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-rphf-before.svg --features=-rphf --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc
hb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc
svg_stack.py --direction=h sinhala-rphf-before.svg right-arrow.svg sinhala-rphf-after.svg > sinhala-rphf.svg
#### Duplicates for other subsections
cp sinhala-rphf.svg sinhala-rphf-1.svg
cluster_styles = [
## 3.10 `pstf`
> Not needed?
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-before.svg --features=-pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dde
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pstf-after.svg --features=+pstf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0ddf
svg_stack.py --direction=h sinhala-pstf-before.svg right-arrow.svg sinhala-pstf-after.svg > sinhala-pstf.svg
## 3.11 `vatu`
> Same as 2.7
### Va
> Same as 2.7
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=25cc,0dd9,0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0dcf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-matra-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da0,0dca,0db1,0dca,200d,0daf,0dca,200d,0dbb,0ddc
svg_stack.py --direction=h sinhala-matra-position-before.svg right-arrow.svg sinhala-matra-position-after.svg > sinhala-matra-position.svg
## 4.3 Reph position
hb-view --font-size=110 --margin=2,16,2,64 --output-file=sinhala-reph-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,25cc,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-reph-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dca,200d,0dad,0dca,200d,0dae,0dca,200d,0dba,0dd1
svg_stack.py --direction=h sinhala-reph-position-before.svg right-arrow.svg sinhala-reph-position-after.svg > sinhala-reph-position.svg
## 5 `pres`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-before.svg --features=-pres --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,25cc,0dca,0da2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0d9b,200d,0dca,0da2
svg_stack.py --direction=h sinhala-pres-before.svg right-arrow.svg sinhala-pres-after.svg > sinhala-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db6,0dd3
svg_stack.py --direction=h sinhala-abvs-before.svg right-arrow.svg sinhala-abvs-after.svg > sinhala-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0db7,0dd6
svg_stack.py --direction=h sinhala-blws-before.svg right-arrow.svg sinhala-blws-after.svg > sinhala-blws.svg
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dd1
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dbb,0dd1
svg_stack.py --direction=h sinhala-psts-before.svg right-arrow.svg sinhala-psts-after.svg > sinhala-psts.svg
## 6 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-dist-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc5,0ddf
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-dist-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc5,0ddf
svg_stack.py --direction=h sinhala-dist-before.svg right-arrow.svg sinhala-dist-after.svg > sinhala-dist.svg
## 6 `abvm`
> Note: Noto Serif Sinhala implements this as an `abvs`
> substitution. This makes it a less-than ideal illustration, because
> the "after" SVG is a ligated glyph; it must suffice until a suitable
> alternative is found.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc6,0dd3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-abvm-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0dc6,0dd3
svg_stack.py --direction=h sinhala-abvm-before.svg right-arrow.svg sinhala-abvm-after.svg > sinhala-abvm.svg
## 6 `blwm`
> Note: Noto Serif Sinhala double-implements this as a `blws`
> substitution. This makes it a less-than ideal illustration, because
> the "after" SVG is a ligated glyph; it must suffice until a suitable
> alternative is found.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da7,0dca,200d,0da8,0dd4
hb-view --font-size=110 --margin=2,16,2,16 --output-file=sinhala-blwm-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifSinhala-Regular.ttf --unicodes=0da7,0dca,200d,0da8,0dd4
svg_stack.py --direction=h sinhala-blwm-before.svg right-arrow.svg sinhala-blwm-after.svg > sinhala-blwm.svg
================================================
FILE: images/syriac/syriac-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-syriac.md](../../opentype-shaping-syriac.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Dalath Rish group ##
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-dalath-rish.png --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0715,072a,0716
## 3. `stch`
> Note: Noto seems to implement this in a set of `calt` substitutions,
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-before.png --features=-stch,-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-after.png --features=+stch --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710
montage syriac-stch-before.png right-arrow.png syriac-stch-after.png -geometry +0+0 -background transparent syriac-stch.png
## 4.1 `locl`
> Note: None found in Noto fonts.
## 4.2 `isol`
> Note: none found in Noto fonts.
## 4.3 `fina`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-before.png --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-after.png --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722
## 4.4 `fin2`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-before.png --features=-fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-after.png --features=+fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710
montage syriac-fin2-before.png right-arrow.png syriac-fin2-after.png -geometry +0+0 -background transparent syriac-fin2.png
## 4.5 `fin3`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-before.png --features=-fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-after.png --features=+fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710
montage syriac-fin3-before.png right-arrow.png syriac-fin3-after.png -geometry +0+0 -background transparent syriac-fin3.png
## 4.6 `medi`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-before.png --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-after.png --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc
montage syriac-medi-before.png right-arrow.png syriac-medi-after.png -geometry +0+0 -background transparent syriac-medi.png
## 4.7 `med2`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-before.png --features=-med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-after.png --features=+med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc
montage syriac-med2-before.png right-arrow.png syriac-med2-after.png -geometry +0+0 -background transparent syriac-med2.png
## 4.8 `init`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-before.png --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-after.png --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc
montage syriac-init-before.png right-arrow.png syriac-init-after.png -geometry +0+0 -background transparent syriac-init.png
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-before.png --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,25cc,0308
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-after.png --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,0308
montage syriac-rlig-before.png right-arrow.png syriac-rlig-after.png -geometry +0+0 -background transparent syriac-rlig.png
## 4.11 `calt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-before.png --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-after.png --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c
montage syriac-calt-before.png right-arrow.png syriac-calt-after.png -geometry +0+0 -background transparent syriac-calt.png
## 5.1 `liga`
> Note: Noto Syriac implements this as a `calt` lookup for unknown reasons.
>
> This seems to be a known shortcoming. See
> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)
> for more information.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-before.png --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-after.png --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710
montage syriac-liga-before.png right-arrow.png syriac-liga-after.png -geometry +0+0 -background transparent syriac-liga.png
## 5.2 `dlig`
> Note: none found in Noto Syriac.
>
> This seems to be a known shortcoming. See
> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)
> for more information.
## 7.3 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733
montage syriac-mark-before.png right-arrow.png syriac-mark-after.png -geometry +0+0 -background transparent syriac-mark.png
## 7.4 `mkmk`
> Note: Noto Sans Syriac (all) fonts have a `mkmk` table but it does
> not seem to work.
================================================
FILE: images/syriac/syriac-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-syriac.md](../../opentype-shaping-syriac.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Dalath Rish group ##
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-dalath-rish.svg --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0715,072a,0716
## 3. `stch`
> Note: Noto seems to implement this in a set of `calt` substitutions,
> for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-before.svg --features=-stch,-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-stch-after.svg --features=+stch --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0732,070f,0728,0721,0735,0710
svg_stack --direction=h syriac-stch-before.svg right-arrow.svg syriac-stch-after.svg > syriac-stch.svg
## 4.1 `locl`
> Note: None found in Noto fonts.
## 4.2 `isol`
> Note: none found in Noto fonts.
## 4.3 `fina`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-before.svg --features=-fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fina-after.svg --features=+fina --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0722
svg_stack --direction=h syriac-fina-before.svg right-arrow.svg syriac-fina-after.svg > syriac-fina.svg
## 4.4 `fin2`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-before.svg --features=-fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin2-after.svg --features=+fin2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0717,0710
svg_stack --direction=h syriac-fin2-before.svg right-arrow.svg syriac-fin2-after.svg > syriac-fin2.svg
## 4.5 `fin3`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-before.svg --features=-fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-fin3-after.svg --features=+fin3 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072f,0710
svg_stack --direction=h syriac-fin3-before.svg right-arrow.svg syriac-fin3-after.svg > syriac-fin3.svg
## 4.6 `medi`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-before.svg --features=-medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-medi-after.svg --features=+medi --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0724,25cc
svg_stack --direction=h syriac-medi-before.svg right-arrow.svg syriac-medi-after.svg > syriac-medi.svg
## 4.7 `med2`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-before.svg --features=-med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-med2-after.svg --features=+med2 --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=25cc,0710,25cc
svg_stack --direction=h syriac-med2-before.svg right-arrow.svg syriac-med2-after.svg > syriac-med2.svg
## 4.8 `init`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-before.svg --features=-init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-init-after.svg --features=+init --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0721,25cc
svg_stack --direction=h syriac-init-before.svg right-arrow.svg syriac-init-after.svg > syriac-init.svg
## 4.9 `rlig`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-before.svg --features=-rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,25cc,0308
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-rlig-after.svg --features=+rlig --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=072a,0308
svg_stack --direction=h syriac-rlig-before.svg right-arrow.svg syriac-rlig-after.svg > syriac-rlig.svg
## 4.11 `calt`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-calt-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacWestern-Regular.ttf --unicodes=0720,071c
svg_stack --direction=h syriac-calt-before.svg right-arrow.svg syriac-calt-after.svg > syriac-calt.svg
## 5.1 `liga`
> Note: Noto Syriac implements this as a `calt` lookup for unknown reasons.
>
> This seems to be a known shortcoming. See
> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)
> for more information.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-before.svg --features=-calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-liga-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEstrangela-Regular.ttf --unicodes=0720,0710
svg_stack --direction=h syriac-liga-before.svg right-arrow.svg syriac-liga-after.svg > syriac-liga.svg
## 5.2 `dlig`
> Note: none found in Noto Syriac.
>
> This seems to be a known shortcoming. See
> [https://github.com/googlei18n/noto-fonts/issues/665](https://github.com/googlei18n/noto-fonts/issues/665)
> for more information.
## 7.3 `mark`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733
hb-view --font-size=110 --margin=2,16,2,16 --output-file=syriac-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansSyriacEastern-Regular.ttf --unicodes=0712,0733
svg_stack --direction=h syriac-mark-before.svg right-arrow.svg syriac-mark-after.svg > syriac-mark.svg
## 7.4 `mkmk`
> Note: Noto Sans Syriac (all) fonts have a `mkmk` table but it does
> not seem to work.
================================================
FILE: images/tamil/tamil-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-tamil.md](../../opentype-shaping-tamil.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bcc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,25cc,0bd7
montage tamil-matra-decompose-before.png right-arrow.png tamil-matra-decompose-after.png -geometry +0+0 -background transparent tamil-matra-decompose.png
## 3.2 `nukt`
> None found.
## 3.3 `akhn`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7
montage tamil-akhn-kssa-before.png right-arrow.png tamil-akhn-kssa-after.png -geometry +0+0 -background transparent tamil-akhn-kssa.png
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-before.png --features=-half,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,25cc,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-after.png --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,0bcd
montage tamil-half-before.png right-arrow.png tamil-half-after.png -geometry +0+0 -background transparent tamil-half.png
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,0bb0,0bcd,0b9a,0bcd,0b9c,0bbe
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb0,0bcd,0b9a,0bcd,0b9c,0bca
montage tamil-matra-position-before.png right-arrow.png tamil-matra-position-after.png -geometry +0+0 -background transparent tamil-matra-position.png
## 5 `pres`
> Note: Noto Serif Tamil implements this as an `akhn` feature for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-before.png --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-after.png --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0
montage tamil-pres-before.png right-arrow.png tamil-pres-after.png -geometry +0+0 -background transparent tamil-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0
montage tamil-abvs-before.png right-arrow.png tamil-abvs-after.png -geometry +0+0 -background transparent tamil-abvs.png
## 5 `blws`
> None found.
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2
montage tamil-psts-before.png right-arrow.png tamil-psts-after.png -geometry +0+0 -background transparent tamil-psts.png
## `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-before.png --features=-haln,-half,-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd
montage tamil-haln-before.png right-arrow.png tamil-haln-after.png -geometry +0+0 -background transparent tamil-haln.png
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd
montage tamil-abvm-before.png right-arrow.png tamil-abvm-after.png -geometry +0+0 -background transparent tamil-abvm.png
## 6 `blwm`
> None found.
================================================
FILE: images/tamil/tamil-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-tamil.md](../../opentype-shaping-tamil.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bcc
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,25cc,0bd7
svg_stack.py --direction=h tamil-matra-decompose-before.svg right-arrow.svg tamil-matra-decompose-after.svg > tamil-matra-decompose.svg
## 3.2 `nukt`
> None found. Testing with Grantha Nukta.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-nukt-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf5,25cc,1133c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-nukt-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf5,1133c
svg_stack.py --direction=h tamil-nukt-before.svg right-arrow.svg tamil-nukt-after.svg > tamil-nukt.svg
## 3.3 `akhn`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-akhn-kssa-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0b95,0bcd,0bb7
svg_stack.py --direction=h tamil-akhn-kssa-before.svg right-arrow.svg tamil-akhn-kssa-after.svg > tamil-akhn-kssa.svg
## 3.9 `half`
> Simulated output using a `mark` lookup; no example found.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-before.svg --features=-half,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,25cc,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b99,0bcd
svg_stack.py --direction=h tamil-half-before.svg right-arrow.svg tamil-half-after.svg > tamil-half.svg
## 4.2 Pre-base matras
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bc6,0bb0,0bcd,0b9a,0bcd,0b9c,0bbe
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-matra-position-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb0,0bcd,0b9a,0bcd,0b9c,0bca
svg_stack.py --direction=h tamil-matra-position-before.svg right-arrow.svg tamil-matra-position-after.svg > tamil-matra-position.svg
## 5 `pres`
> Note: Noto Serif Tamil implements this as an `akhn` feature for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-before.svg --features=-akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-pres-after.svg --features=+akhn --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb6,0bcd,0bb0,0bc0
svg_stack.py --direction=h tamil-pres-before.svg right-arrow.svg tamil-pres-after.svg > tamil-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0baf,0bc0
svg_stack.py --direction=h tamil-abvs-before.svg right-arrow.svg tamil-abvs-after.svg > tamil-abvs.svg
## 5 `blws`
> None found.
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb4,0bc2
svg_stack.py --direction=h tamil-psts-before.svg right-arrow.svg tamil-psts-after.svg > tamil-psts.svg
## `haln`
> Simulated output using a `mark` lookup; no example found.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-before.svg --features=-haln,-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,25cc,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTamil-Regular.ttf --unicodes=0b9e,0bcd
svg_stack.py --direction=h tamil-haln-before.svg right-arrow.svg tamil-haln-after.svg > tamil-haln.svg
## 6 `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf4,0b85
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf4,0b85
svg_stack.py --direction=h tamil-dist-before.svg right-arrow.svg tamil-dist-after.svg > tamil-dist.svg
## 6 `kern`
> None found.
## 6 `abvm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bb9,0bcd
svg_stack.py --direction=h tamil-abvm-before.svg right-arrow.svg tamil-abvm-after.svg > tamil-abvm.svg
## 6 `blwm`
> Note: Noto Serif Tamil has a `blwm` feature, but it fails to attach
> the included mark (`U+0952`) for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf7,0952
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tamil-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTamil-Regular.ttf --unicodes=0bf7,0952
svg_stack.py --direction=h tamil-blwm-before.svg right-arrow.svg tamil-blwm-after.svg > tamil-blwm.svg
================================================
FILE: images/telugu/telugu-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-telugu.md](../../opentype-shaping-telugu.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-before.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c46,25cc,0c56
montage telugu-matra-decompose-before.png right-arrow.png telugu-matra-decompose-after.png -geometry +0+0 -background transparent telugu-matra-decompose.png
## 3.3 `akhn`
### KSsa
> Note: Noto Serif Telugu implements this as a `pres`+`blwf`
> substitution for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-before.png --features=-blwf,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,25cc,0c4d,0c37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-after.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,0c4d,0c37
montage telugu-akhn-kssa-before.png right-arrow.png telugu-akhn-kssa-after.png -geometry +0+0 -background transparent telugu-akhn-kssa.png
### JNya
> None found. Microsoft docs reference a "SsJa" akhand form, which is
> also not found.
## 3.4 `rphf`
> None found.
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-before.png --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,25cc,0c4d,0c24
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-after.png --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,0c4d,0c24
montage telugu-blwf-before.png right-arrow.png telugu-blwf-after.png -geometry +0+0 -background transparent telugu-blwf.png
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-before.png --features=-half,-haln --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,25cc,0c4d,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-after.png --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,0c4d,200d
montage telugu-half-before.png right-arrow.png telugu-half-after.png -geometry +0+0 -background transparent telugu-half.png
## 3.10 `pstf`
> None found.
## 3.12 `cjct`
> None found.
## 4.2 Pre-base matras
> Not applicable.
## 4.3 Reph position
> No examples found; existing fonts seem not to incorporate Reph for
> Telugu....
## 5 `pres`
> Note: Example from Noto Serif Telugu, but it looks like it should be
> a `abvs` substitution instead....
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-before.png --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-after.png --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c
montage telugu-pres-before.png right-arrow.png telugu-pres-after.png -geometry +0+0 -background transparent telugu-pres.png
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c40
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c40
montage telugu-abvs-before.png right-arrow.png telugu-abvs-after.png -geometry +0+0 -background transparent telugu-abvs.png
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c46,0c56
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c16,0c46,0c56
montage telugu-blws-before.png right-arrow.png telugu-blws-after.png -geometry +0+0 -background transparent telugu-blws.png
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-before.png --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c2b,0c42
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-after.png --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c2b,0c42
montage telugu-psts-before.png right-arrow.png telugu-psts-after.png -geometry +0+0 -background transparent telugu-psts.png
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-before.png --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-after.png --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d
montage telugu-haln-before.png right-arrow.png telugu-haln-after.png -geometry +0+0 -background transparent telugu-haln.png
## `abvm`
> None found.
## `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c62
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c62
montage telugu-blwm-before.png right-arrow.png telugu-blwm-after.png -geometry +0+0 -background transparent telugu-blwm.png
================================================
FILE: images/telugu/telugu-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-telugu.md](../../opentype-shaping-telugu.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
> Note: always use `--features=-init` in examples where the `init`
> feature itself is not being explained.
## 2.2 Matra decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-before.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c48
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-matra-decompose-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c46,25cc,0c56
svg_stack.py --direction=h telugu-matra-decompose-before.svg right-arrow.svg telugu-matra-decompose-after.svg > telugu-matra-decompose.svg
## 3.2 `nukt`
> Note: Noto Serif Telugu implements this in a `blwm` feature.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-nukt-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c18,0c3c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-nukt-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c3c
svg_stack.py --direction=h telugu-nukt-before.svg right-arrow.svg telugu-nukt-after.svg > telugu-nukt.svg
## 3.3 `akhn`
### KSsa
> Note: Noto Serif Telugu implements this as a `pres`+`blwf`
> substitution for unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-before.svg --features=-blwf,-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,25cc,0c4d,0c37
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-akhn-kssa-after.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c15,0c4d,0c37
svg_stack.py --direction=h telugu-akhn-kssa-before.svg right-arrow.svg telugu-akhn-kssa-after.svg > telugu-akhn-kssa.svg
### JNya
> None found. Microsoft docs reference a "SsJa" akhand form, which is
> also not found.
## 3.4 `rphf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-rphf-before.svg --features=-rphf,-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --preserve-default-ignorables --unicodes=0c30,0c4d,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-rphf-after.svg --features=+rphf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c30,0c4d,200d
svg_stack.py --direction=h telugu-rphf-before.svg right-arrow.svg telugu-rphf-after.svg > telugu-rphf.svg
## 3.7 `blwf`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-before.svg --features=-blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,25cc,0c4d,0c24
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwf-after.svg --features=+blwf --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c17,0c4d,0c24
svg_stack.py --direction=h telugu-blwf-before.svg right-arrow.svg telugu-blwf-after.svg > telugu-blwf.svg
## 3.9 `half`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-before.svg --features=-half,-haln --preserve-default-ignorables --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,25cc,0c4d,200d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-half-after.svg --features=+half --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTelugu-Regular.ttf --unicodes=0c22,0c4d,200d
svg_stack.py --direction=h telugu-half-before.svg right-arrow.svg telugu-half-after.svg > telugu-half.svg
## 3.10 `pstf`
> None found.
## 3.12 `cjct`
> None found.
## 4.2 Pre-base matras
> Not applicable.
## 4.3 Reph position
> No examples found; existing fonts seem not to incorporate Reph for
> Telugu....
## 5 `pres`
> Note: Example from Noto Serif Telugu, but it looks like it should be
> a `abvs` substitution instead....
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-before.svg --features=-pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-pres-after.svg --features=+pres --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c39,0c4c
svg_stack.py --direction=h telugu-pres-before.svg right-arrow.svg telugu-pres-after.svg > telugu-pres.svg
## 5 `abvs`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c40
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c40
svg_stack.py --direction=h telugu-abvs-before.svg right-arrow.svg telugu-abvs-after.svg > telugu-abvs.svg
## 5 `blws`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c4d,0c24,0c4d,0c30
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c16,0c4d,0c24,0c4d,0c30
svg_stack.py --direction=h telugu-blws-before.svg right-arrow.svg telugu-blws-after.svg > telugu-blws.svg
## 5 `psts`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-before.svg --features=-psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2b,0c42
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-psts-after.svg --features=+psts --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2b,0c42
svg_stack.py --direction=h telugu-psts-before.svg right-arrow.svg telugu-psts-after.svg > telugu-psts.svg
## 5 `haln`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-before.svg --features=-haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-haln-after.svg --features=+haln --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c2f,0c4d
svg_stack.py --direction=h telugu-haln-before.svg right-arrow.svg telugu-haln-after.svg > telugu-haln.svg
## `dist`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-dist-before.svg --features=-dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c44
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-dist-after.svg --features=+dist --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c19,0c44
svg_stack.py --direction=h telugu-dist-before.svg right-arrow.svg telugu-dist-after.svg > telugu-dist.svg
## `abvm`
> Note: Noto Serif Telugu implements this in a `blwm` feature, for
> unknown reasons.
hb-view --font-size=110 --margin=32,16,2,16 --output-file=telugu-abvm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c0f,0c00
hb-view --font-size=110 --margin=32,16,2,16 --output-file=telugu-abvm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c0f,0c00
svg_stack.py --direction=h telugu-abvm-before.svg right-arrow.svg telugu-abvm-after.svg > telugu-abvm.svg
## `blwm`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c4d,0c26
hb-view --font-size=110 --margin=2,16,2,16 --output-file=telugu-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTelugu-Regular.ttf --unicodes=0c1d,0c4d,0c26
svg_stack.py --direction=h telugu-blwm-before.svg right-arrow.svg telugu-blwm-after.svg > telugu-blwm.svg
================================================
FILE: images/thai-lao/thai-lao-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-thai-lao.md](../../opentype-shaping-thai-lao.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 1.1 `ccmp`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d
montage thai-ccmp-before.png right-arrow.png thai-ccmp-after.png -geometry +0+0 -background transparent thai-ccmp.png
## 1.2 Decomposition
## 1.2 Am sign decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0eb3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0ecd,25cc,0eb2
montage lao-am-decomposition-before.png right-arrow.png lao-am-decomposition-after.png -geometry +0+0 -background transparent lao-am-decomposition.png
## 4 `kern`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-before.png --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-after.png --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a
montage lao-kern-before.png right-arrow.png lao-kern-after.png -geometry +0+0 -background transparent lao-kern.png
## 4 `mark`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-before.png --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,25cc,0e38
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-after.png --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,0e38
montage thai-mark-before.png right-arrow.png thai-mark-after.png -geometry +0+0 -background transparent thai-mark.png
## 4 `mkmk`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-before.png --features=-mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-after.png --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48
montage thai-mkmk-before.png right-arrow.png thai-mkmk-after.png -geometry +0+0 -background transparent thai-mkmk.png
## PUA 1 - Sara Am decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e33
hb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4d,25cc,0e32
montage thai-am-decomposition-before.png right-arrow.png thai-am-decomposition-after.png -geometry +0+0 -background transparent thai-am-decomposition.png
================================================
FILE: images/thai-lao/thai-lao-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-thai-lao.md](../../opentype-shaping-thai-lao.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## 1.1 `ccmp`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-ccmp-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4a,0e4d
svg_stack --direction=h thai-ccmp-before.svg right-arrow.svg thai-ccmp-after.svg > thai-ccmp.svg
## 1.2 Decomposition
## 1.2 Am sign decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0eb3
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-am-decomposition-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=25cc,0ecd,25cc,0eb2
svg_stack --direction=h lao-am-decomposition-before.svg right-arrow.svg lao-am-decomposition-after.svg > lao-am-decomposition.svg
## 4 `kern`
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a
hb-view --font-size=110 --margin=2,16,2,16 --output-file=lao-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifLao-Regular.ttf --unicodes=0ec1,0e9a
svg_stack --direction=h lao-kern-before.svg right-arrow.svg lao-kern-after.svg > lao-kern.svg
## 4 `mark`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-before.svg --features=-mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,25cc,0e38
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mark-after.svg --features=+mark --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=0e0e,0e38
svg_stack --direction=h thai-mark-before.svg right-arrow.svg thai-mark-after.svg > thai-mark.svg
## 4 `mkmk`
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-before.svg --features=-mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48
hb-view --font-size=110 --margin=16,16,2,16 --output-file=thai-mkmk-after.svg --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e31,0e48
svg_stack --direction=h thai-mkmk-before.svg right-arrow.svg thai-mkmk-after.svg > thai-mkmk.svg
## PUA 1 - Sara Am decomposition
hb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e33
hb-view --font-size=110 --margin=2,16,2,16 --output-file=thai-am-decomposition-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifThai-Regular.ttf --unicodes=25cc,0e4d,25cc,0e32
svg_stack --direction=h thai-am-decomposition-before.svg right-arrow.svg thai-am-decomposition-after.svg > thai-am-decomposition.svg
================================================
FILE: images/tibetan/tibetan-png-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-tibetan.md](../../opentype-shaping-tibetan.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.png --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Syllable identification
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-syllable.png --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f54,0f7a,0f0b,0f56,0f66,0f90,0fb2,0f74,0f53
## 1.2 ccmp
hb-view --font-size=110 --margin=2,16,2,72 --output-file=tibetan-ccmp-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f77
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-ccmp-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0fb2,00a0,00a0,0f71,25cc,0f80
montage tibetan-ccmp-before.png right-arrow.png tibetan-ccmp-after.png -geometry +0+0 -background transparent tibetan-ccmp.png
## 2.1 abvs
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-before.png --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-after.png --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e
montage tibetan-abvs-before.png right-arrow.png tibetan-abvs-after.png -geometry +0+0 -background transparent tibetan-abvs.png
## 2.2 blws
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-before.png --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f4a,0f91
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-after.png --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f4a,0f91
montage tibetan-blws-before.png right-arrow.png tibetan-blws-after.png -geometry +0+0 -background transparent tibetan-blws.png
## 2.3 calt
> Note: Noto Sans Tibetan calls this substitution twice, in calt and
> in abvs.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-before.png --features=-calt,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0f7d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-after.png --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0f7d
montage tibetan-calt-before.png right-arrow.png tibetan-calt-after.png -geometry +0+0 -background transparent tibetan-calt.png
## 2.4 liga
> Note: Noto Sans Tibetan implements this as a ccmp substitution for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-before.png --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-after.png --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7
montage tibetan-liga-before.png right-arrow.png tibetan-liga-after.png -geometry +0+0 -background transparent tibetan-liga.png
## 3 kern
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-before.png --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-after.png --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9
montage tibetan-kern-before.png right-arrow.png tibetan-kern-after.png -geometry +0+0 -background transparent tibetan-kern.png
## 3 abvm
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-before.png --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-after.png --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e
montage tibetan-abvm-before.png right-arrow.png tibetan-abvm-after.png -geometry +0+0 -background transparent tibetan-abvm.png
## 3 blwm
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-before.png --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-after.png --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74
montage tibetan-blwm-before.png right-arrow.png tibetan-blwm-after.png -geometry +0+0 -background transparent tibetan-blwm.png
## 3 mkmk
> Note: Noto Sans Tibetan implements this is both blwm and mkmk for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-before.png --features=-mkmk,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f71,0f35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-after.png --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSerifTibetan-Regular.ttf --unicodes=0f51,0f71,0f35
montage tibetan-mkmk-before.png right-arrow.png tibetan-mkmk-after.png -geometry +0+0 -background transparent tibetan-mkmk.png
================================================
FILE: images/tibetan/tibetan-svg-image-generation-log.md
================================================
# Commands used to generate the images in [opentype-shaping-tibetan.md](../../opentype-shaping-tibetan.md)
## Arrow general
hb-view --font-size=110 --output-file=right-arrow.svg --background=FFFFFF00 --margin=0,0,0,0 /usr/share/fonts/opentype/gentiumplus/GentiumPlus-R.ttf --unicodes=2192
## Syllable identification
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-syllable.svg --features= --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f54,0f7a,0f0b,0f56,0f66,0f90,0fb2,0f74,0f53
## 1.2 ccmp
hb-view --font-size=110 --margin=2,16,2,72 --output-file=tibetan-ccmp-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f77
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-ccmp-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0fb2,00a0,00a0,0f71,25cc,0f80
svg_stack --direction=h tibetan-ccmp-before.svg right-arrow.svg tibetan-ccmp-after.svg > tibetan-ccmp.svg
## 2.1 abvs
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-before.svg --features=-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvs-after.svg --features=+abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f49,0f7b,0f7e
svg_stack --direction=h tibetan-abvs-before.svg right-arrow.svg tibetan-abvs-after.svg > tibetan-abvs.svg
## 2.2 blws
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-before.svg --features=-blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f4a,0f91
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blws-after.svg --features=+blws --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f4a,0f91
svg_stack --direction=h tibetan-blws-before.svg right-arrow.svg tibetan-blws-after.svg > tibetan-blws.svg
## 2.3 calt
> Note: Noto Sans Tibetan calls this substitution twice, in calt and
> in abvs.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-before.svg --features=-calt,-abvs --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0f7d
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-calt-after.svg --features=+calt --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0f7d
svg_stack --direction=h tibetan-calt-before.svg right-arrow.svg tibetan-calt-after.svg > tibetan-calt.svg
## 2.4 liga
> Note: Noto Sans Tibetan implements this as a ccmp substitution for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-before.svg --features=-ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-liga-after.svg --features=+ccmp --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f97,0f39,0fb7
svg_stack --direction=h tibetan-liga-before.svg right-arrow.svg tibetan-liga-after.svg > tibetan-liga.svg
## 3 kern
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-before.svg --features=-kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-kern-after.svg --features=+kern --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f65,0f0b,0f62,0fa9
svg_stack --direction=h tibetan-kern-before.svg right-arrow.svg tibetan-kern-after.svg > tibetan-kern.svg
## 3 abvm
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-before.svg --features=-abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-abvm-after.svg --features=+abvm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f61,0f80,0f7e
svg_stack --direction=h tibetan-abvm-before.svg right-arrow.svg tibetan-abvm-after.svg > tibetan-abvm.svg
## 3 blwm
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-before.svg --features=-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-blwm-after.svg --features=+blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f59,0fb3,0f71,0f74
svg_stack --direction=h tibetan-blwm-before.svg right-arrow.svg tibetan-blwm-after.svg > tibetan-blwm.svg
## 3 mkmk
> Note: Noto Sans Tibetan implements this is both blwm and mkmk for
> unknown reasons.
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-before.svg --features=-mkmk,-blwm --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f71,0f35
hb-view --font-size=110 --margin=2,16,2,16 --output-file=tibetan-mkmk-after.svg --features=+mkmk --background=FFFFFF00 /usr/share/fonts/truetype/noto/NotoSansTibetan-Regular.ttf --unicodes=0f51,0f71,0f35
svg_stack --direction=h tibetan-mkmk-before.svg right-arrow.svg tibetan-mkmk-after.svg > tibetan-mkmk.svg
================================================
FILE: index.md
================================================
```{include} /_global.md
```
# OpenType shaping documents #
Sponsored by [YesLogic](https://yeslogic.com/)
__
:::{admonition} 🆆 🅰 🆁 🅽 🅸 🅽 🅶
:class: caution
These documents are an active WORK IN PROGRESS.
NONE of the documents you currently see here are complete
nor are they suitable for reference. PLEASE do not use
them as a guide or as a general information source.
As long as this warning text remains visible, the above
holds true.
:::
These documents are meant to provide a functional specification for
text shaping. The expectation is that an implementer of this
specification will be using fonts in the OpenType font format applied
to input text that complies with Unicode.
Because application software and end-user documents may utilize
non-OpenType fonts and non-Unicode text (in particular, when older
fonts or documents are encountered), these documents also provide
functional information that a shaping engine may use to implement a
reasonable best-effort attempt at producing useful output in the most
common of such scenarios.
## Shapers
The shaping behavior described here can be roughly divided into five
categories.
All non-complex scripts follow the same
[default](opentype-shaping-default.md) shaping model.
The _Indic Model_ is shared by ten individual scripts. These scripts
follow the same overall approach to shaping, described in the [Indic
general](opentype-shaping-indic-general.md) document, but each script
incorporates script-specific details, which are more fully described
in its own document:
- [Devanagari](opentype-shaping-devanagari.md)
- [Bengali](opentype-shaping-bengali.md)
- [Gujarati](opentype-shaping-gujarati.md)
- [Gurmukhi](opentype-shaping-gurmukhi.md)
- [Kannada](opentype-shaping-kannada.md)
- [Malayalam](opentype-shaping-malayalam.md)
- [Oriya](opentype-shaping-oriya.md)
- [Tamil](opentype-shaping-tamil.md)
- [Telugu](opentype-shaping-telugu.md)
- [Sinhala](opentype-shaping-sinhala.md)
The _Arabic Model_ is shared by four individual scripts. These scripts
follow the same overall approach to shaping, described in the [Arabic
general](opentype-shaping-arabic-general.md) document, but each script
incorporates script-specific details, which are more fully described
in its own document:
- [Arabic](opentype-shaping-arabic.md)
- [N'Ko](opentype-shaping-nko.md)
- [Syriac](opentype-shaping-syriac.md)
- [Mongolian](opentype-shaping-mongolian.md)
Five of the remaining scripts each use a distinct, script-specific
model, with two others (Thai and Lao) sharing enough details to be
handled by a common shaper:
- [Hangul](opentype-shaping-hangul.md)
- [Hebrew](opentype-shaping-hebrew.md)
- [Khmer](opentype-shaping-khmer.md)
- [Thai and Lao](opentype-shaping-thai-lao.md)
- [Tibetan](opentype-shaping-tibetan.md)
- [Myanmar](opentype-shaping-myanmar.md)
Finally, the Universal Shaping Engine (USE) model is designed to shape all
complex scripts that are not handled by a dedicated
script-specific shaping model in the lists above:
- [Universal Shaping Engine (USE)](opentype-shaping-use.md)
In addition, these documents describe the handling of emoji
sequences. Although emoji sequences do not constitute a separate
shaping model, handling emoji sequences can incorporate many of the
same shaping mechanisms and shaping engine implementations may be
expected to handle them:
- [Emoji](opentype-shaping-emoji.md)
Shaping is just one part of the overall text-handling process. These
documents assume that other components in the software stack will be
responsible for details such as handling higher-level markup, layout,
font matching and loading, rasterization, and so on. Most importantly,
these documents assume that the input text has already been segmented
into text runs that consist of a single language, script, font, and
all other markup considerations (such as size or color, for example).
Within those assumptions, the shaping of a particular text run should
be consistent, regardless of whether the higher-level processes
involve a document, user-interface element, network stream, or any
other context for displaying text.
## Normalization
However, these documents also include a description of text
[normalization](opentype-shaping-normalization.md) in the OpenType
shaping context, which differs from Unicode normalization in several
respects. Shaping engine implementations may differ as to whether the
shaping engine itself is responsible for handling normalization or
whether normalization is handled by another component
in the stack.
## Additional information
Various practical [notes](notes/index.md) about this document set and
the details of its scope, limitations, and quirks are also provided.
Some [errata](errata.md) about the "upstream" specifications and
reference documents are noted separately.
In its final form, this repository will hold documentation describing
the shaping behavior used for layout of OpenType text. In particular,
it will focus on complex scripts.
In addition to the primary, per-script documents, implementers and
other interested readers are encouraged to check the
[character tables](character-tables/index.md) for correctness and to
examine the [image-generation logs](https://github.com/n8willis/opentype-shaping-documents/images/README.md) to identify
issues seen in the inline images.
## Feedback
Interested readers, font developers, and shaping-engine implementers
are encouraged to provide feedback, ask questions, and propose
improvements to any part of these documents. Shaping is the concern of
software developers and readers across the world, and all are welcome
to participate in recording and clarifying what is required to produce
the best and most accurate text output possible, both now and in the
future.
See the upstream git repository at
[github.com/n8willis/opentype-shaping-documents](https://github.com/n8willis/opentype-shaping-documents)
to raise issues, ask questions, or add comments.
## References
These documents cite the following informative references:
1. The Microsoft [Script development
specifications](https://docs.microsoft.com/en-us/typography/script-development/standard),
which document the behaviors expected for OpenType Layout fonts and
provide guidance & examples for type designers. OpenType is a
registered trademark of Microsoft Corporation.
2. Related portions of the Microsoft OpenType specification, such as the
[OpenType Layout tag
registry](https://docs.microsoft.com/en-us/typography/opentype/spec/ttoreg)
and [OpenType Layout common table
formats](https://docs.microsoft.com/en-us/typography/opentype/spec/chapter2),
which list and define feature tags, script & language tags, and
other internals of compliant OpenType font binaries. OpenType is a
registered trademark of Microsoft Corporation.
3. The [HarfBuzz](https://github.com/harfbuzz/harfbuzz) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
4. The [AllSorts](https://github.com/yeslogic/allsorts) project, which
includes a free-software/open-source implementation of OpenType
Layout shaping with full source code and documentation.
5. The [Unicode
Standard](http://www.unicode.org/standard/standard.html) and
related Unicode Consortium projects such as the [Unicode Character
Database](http://www.unicode.org/reports/tr44/), which defines
Unicode code points and formal character properties used in
shaping. Unicode and the Unicode Logo are registered trademarks of
Unicode, Inc. in the United States and other countries.
6. The YesLogic [text corpus](https://github.com/yeslogic/corpus),
which includes real-world text data for several Indic scripts,
scraped from Wikipedia, Reddit, and multiple online news
sources. This data is used to test shaping in AllSorts and Prince.
7. Known but unofficial information about other shaping-engine
projects. Primarily this includes tests and reproducible issues
found via [HarfBuzz](https://github.com/harfbuzz/harfbuzz), because
HarfBuzz intentionally aims to produce results that will 100% match
the output of Microsoft Uniscribe (not counting cases where
Uniscribe's output is known to be incorrect, of course).
> Note: occasionally, tests or issues documenting the behavior of
> Apple CoreText are also included, but CoreText compatibility is
> not an explicit goal for HarfBuzz.
---
Version {{ env.config.version }}, release {{ env.config.release }};
built {sub-ref}`today`.
================================================
FILE: make.bat
================================================
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
================================================
FILE: notes/README.md
================================================
# Notes #
The files in this directory include auxiliary information that is either
tangential to the main shaping-behavior documentation or is
excessively long enough that trying to include it inline would disrupt
the flow of the text for readers.
Notes included cover:
- [Uniscribe compatibility](/notes/uniscribe-bug-compatibility.md):
Information on preserving strict compatibility with Microsoft's
Uniscribe shaping engine
- [Ragel state-machine operators](/notes/ragel-machine-notation.md):
Information on the syntax of the
[Ragel](http://www.colm.net/open-source/ragel/) state-machine
compiler, which is the reference regular-expression syntax used
when listing regular expressions in the shaping-behavior
documentation, but which itself is not mandatory.
- [Emoji implementation](/notes/emoji-implementation.md): Information
on the image formats, codepoint visibility, and GSUB/GPOS features
used in real-world Emoji fonts distributed by major vendors.
================================================
FILE: notes/emoji-implementation.md
================================================
# Notes on Emoji font implementation #
This document notes details on how common Emoji fonts implement
sequence, modifier, varation-selection, text-presentation
fallback, and other behavior, for the purposes of testing and
debugging.
Emoji fonts are deployed by vendors using a variety of different
image formats (including the `SVG `, `COLR`v0/`CPAL`,
`COLR`v1/`CPAL`, `glyf`, and `cff ` vector formats and the `CBDT`
and `sbix` raster formats), which can make it difficult to
characterize Emoji font behavior.
Similarly, Emoji font vendors have employed a variety of
different OpenType features to implement support for standard
sequences, modifier-based sequences, ZWJ-based sequences and
permutations.
See the [Emoji shaping document](../opentype-shaping-emoji.md)
for more details on the sequences and definitions involved.
## Format, features, and control-codepoint visiblity table ##
This table lists the image format, the GSUB feature(s) used for
basic Emoji sequence support and ZWJ-based sequence support, and
whether or not the font includes a visible glyph for the
presentation selector codepoints (VS15, `U+FE0E`; VS16, `U+FE0F`)
and modifier codepoints (`U+1F3FB`..`U+1F3FF`).
:::{table} Emoji sequence implementation details
| Font | publisher | image format | sequence formation feature | ZWJ sequence feature | visible presentation selector | visible modifier |
|:-----------------------|:----------|:-------------|:---------------------------|:---------------------|:------------------------------|:-----------------|
| Source Emoji | Adobe | cff | ccmp | ccmp, salt | YES | YES |
| Blobmoji | C1710 | CBDT | ccmp | ccmp | no | YES |
| Twemoji | Twitter | SVG | liga | liga | no | YES |
| Noto Color Emoji | Google | CBDT | ccmp | ccmp | no | YES |
| Noto Color Emoji | Google | COLRv1 | ccmp | ccmp | no | YES |
| EmojiTwo Android | EmojiTwo | CBDT | ccmp | ccmp | no | YES |
| EmojiTwo Apple | EmojiTwo | sbix | morx | morx | no | YES |
| EmojiTwo SVG | EmojiTwo | SVG | ccmp | ccmp | no | YES |
| Openmoji | HfG Gmünd | SVG | liga | liga | no | YES |
| FirefoxEmoji | Mozilla | COLRv0 | rlig | rlig | no | no |
| Noto Emoji | Google | glyf | ccmp | ccmp | no | YES |
| Old Noto B&W Emoji | Google | glyf | ccmp | ccmp | no | no |
| JoyPixels | JoyPixels | CBDT | ccmp | ccmp | no | YES |
| Apple Color Emoji | Apple | sbix | morx | morx | no | YES |
| Samsung Color Emoji | Samsung | CBDT | ccmp | ccmp | no | YES |
| Segoe UI Emoji | Microsoft| COLRv0 | ccmp | ccmp | YES | YES |
:::
### Contributing additional data ###
Volunteers or implementers who wish to contribute data for additional
Emoji fonts may need to collecting the information themselves by
inspecting font binaries.
Options available include:
1. **FontTools / TTX**
- Users can run `ttx -l somefontfilename.ttf` (or `.otf` or `.ttc`
or `.otc`) to get a short list of the tables. The presence of `SVG `,
`CBDT`, `sbix`, or `COLR` indicates that whichever one of those exists
is the image format. _If_ none of the above are there but `glyf` or `CFF `
or `CFF2` _is_ there, then whichever of those three exists is the
image format (and means it's a black-and-white emoji font, which users
would probably know beforehand anyway). If there's more than one of
`SVG `, `CBDT`, `sbix`, or `COLR` present in the same font file, that
would likely mean unknown behavior; comments on such cases are welcome.
- Users can run the `layout-features.py somefontfilename.ttf`
script (which can be found in the `/Snippets/` directory of the
`FontTools` package source) and it will print out an indented
list of the GSUB and GPOS features
used. All that matters for the table above is what the script
reports on the `Feature: ` line. For a typical emoji font there's
probably only one feature -- but, if there are several, listing
them is useful.
2. **AllSorts / allsorts-tools**
- Users can use the `dump` tool from the `allsorts-tools` package
to run `allsorts dump somefilename.ttf` and get a list of tables plus
other metadata; the tables are the first output. Same interpretation
as above.
- At the moment it sounds like there isn't a single-command option in
`allsorts` to list GSUB/GPOS features. Corrections are welcome.
3. **GUI font editors**
- Users can also open up the font file in a font editor and look at what
it presents.
- FontForge:
- In FontForge, go to Element -> Font Info in the menu to open the
font-info dialog box. It will show the GSUB/GPOS lookups in the
"Lookups" tab (left-hand side).
- FontForge does _not_ just show a convenient list of all the tables.
However, when users open the font file, the "Warnings" dialog box will
report if it finds `SVG `, `CBDT`, `sbix`, or `COLR` tables.
Unfortunately, it will only actually open the font for
editing/inspection if it finds a `glyf`, `CFF `, or `CFF2` table
(which a `COLR` font would have) or an `SVG ` table. So users can't
use it to inspect the features of the other formats.
(Further instructions will be added to this list for other editors if volunteers
can contribute them)
For determining if there's a printable glyph for the selectors/modifiers:
1. **GUI font editors**
- Users can open up the font in an editor and look at the slots for the
Unicode codepoints for the presentation selectors (`U+FE0E` and `U+FE0F`)
and the modifiers (`U+1F3FB` through `U+1F3FF`), if they exist (they might not).
2. **HarfBuzz**
- Users can run the `hb-view` utility to output glyph contents for specific
Unicode codepoints, but one might have to try a couple of options, depending
on the image format. Run `hb-view --preserve-default-ignorables somefontfilename.ttf --unicodes=fe0e`
to start (for `U+FE0E`). Users may also try adding the `--font-funcs=ot`
and/or `--shapers=ot` flags to that command if it gives trouble.
================================================
FILE: notes/index.md
================================================
# Notes #
This section includes auxiliary information that is either
tangential to the main shaping-behavior documentation or is
excessively long enough that trying to include it inline would disrupt
the flow of the text for readers.
Notes included cover:
- [Uniscribe compatibility](/notes/uniscribe-bug-compatibility.md):
Information on preserving strict compatibility with Microsoft's
Uniscribe shaping engine
- [Ragel state-machine operators](/notes/ragel-machine-notation.md):
Information on the syntax of the
[Ragel](http://www.colm.net/open-source/ragel/) state-machine
compiler, which is the reference regular-expression syntax used
when listing regular expressions in the shaping-behavior
documentation, but which itself is not mandatory.
- [Emoji implementation](/notes/emoji-implementation.md): Information
on the image formats, codepoint visibility, and GSUB/GPOS features
used in real-world Emoji fonts distributed by major vendors.
================================================
FILE: notes/ragel-machine-notation.md
================================================
# Ragel State Machine operators #
As used in the regular expressions cited in various shaper-engine
guides.
```
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
================================================
FILE: notes/uniscribe-bug-compatibility.md
================================================
# Notes for preserving Uniscribe compatibility #
This document details behavior that shaping engines may wish to
implement in order to maintain strict shaping compatibility with
Microsoft's Uniscribe OpenType shaper, including behavior that may be
regarded as bugs by end users.
**Contents**
- [Indic standalone-syllable dotted circles](#indic-standalone-syllable-dotted-circles)
- [Indic syllable cluster merging](#indic-syllable-cluster-merging)
- [Indic fallback Reph reordering](#indic-fallback-reph-reordering)
- [Kannada legacy treatment of "Ra,Halant,ZWJ"](#kannada-legacy-treatment-of-ra-halant-zwj)
- [Khmer kerning](#khmer-kerning)
- [Sinhala matra decomposition](#sinhala-matra-decomposition)
- [Miscellaneous](#miscellaneous)
- [Bengali init feature matching](#bengali-init-feature-matching)
- [Old-model post-base Halant reordering](#old-model-post-base-halant-reordering)
- [Kannada final double Halants](#kannada-final-double-halants)
- [Halants and left matras](#halants-and-left-matras)
- [Explicit half-forms followed by matras](#explicit-half-forms-followed-by-matras)
Compatibility notes in the "miscellaneous" category deal with
behaviors that are incompletely documented, deal solely with
deprecated script tags, or do not violate known conventions. Thus, the
scenarios with which they deal may not be regarded as bugs.
## Indic standalone-syllable dotted circles ##
In Indic syllables that include a `PLACEHOLDER` or `DOTTED_CIRCLE`
codepoint, if a dotted-circle glyph is the last consonant of the
syllable, Uniscribe ignores the glyph when processing the syllable.
For example, the dotted-circle glyph is not counted as a consonant
when locating the syllable's base consonant. Therefore, the sequence
"Ra,Halant,Dotted_Circle" does not trigger Reph formation (which would
result in the sequence "Reph,Dotted_Circle").
## Indic syllable cluster merging ##
Other shaping engines, such as HarfBuzz, track the indivisible
components of a syllable in "clusters". Each individual letter usually
corresponds to a cluster; when two letters ligate or form a conjunct,
their clusters are merged. When a codepoint is decomposed, its
components remain part of the same, original cluster as the
precomposed version. Uniscribe appears to follow this pattern as well.
When shaping Indic text in most scripts, after shaping the entire
syllable, Uniscribe merges all of the clusters of the syllable into a
single, indivisible cluster.
The exceptions to this behavior occur when Uniscribe is shaping Tamil
and Sinhala. In those cases, the full-syllable cluster merge is not
performed.
> Note: This full-syllable clustering makes it hard for application
> software to position the cursor within the word. It may also have
> other implications for software above the shaping engine in the
> stack.
## Indic fallback Reph reordering ##
When shaping Indic syllables, any one of several Reph-positioning
strategies may be required by the active script. In the event that no
correct position can be determined by the shaping engine for a
syllable, Uniscribe's ultimate fallback behavior is to reorder the
Reph to the end of the syllable.
If the Reph is reordered to the end of the syllable and this final
position happens to occur immediately after a "Matra,Halant" sequence,
Uniscribe leaves the Reph in this position.
Other shaping engines, in this situation, will reorder the Reph to a
position immediately before the "Matra,Halant" sequence. This allows
for any GSUB substitutions that match "Reph,Matra" sequences to be
activated, if any such substitution rules are present in the active
font.
## Kannada legacy treatment of "Ra,Halant,ZWJ" ##
In the `` shaping model (which was deprecated in 2005 in favor
of ``), the sequence "Ra,Halant,ZWJ" was treated as equivalent
to the sequence "Ra,ZWJ,Halant".
## Khmer kerning ##
Uniscribe does not apply the `kern` feature to Khmer text, even if the
active font includes kerning tables for Khmer codepoints.
## Sinhala matra decomposition ##
Sinhala text in OpenType presents two possible methods for
decomposing multi-part matras.
One is the canonical Unicode decompositions for the matra codepoints,
as is used in most other Indic scripts. This decomposition is usually
performed early in the shaping process.
The second is the `pstf` feature of GSUB, which is defined differently
for Sinhala. In Sinhala, the `pstf` feature replaces multi-part
dependent vowels (matras) with the right-side matra component of the
canonical decomposition. This substitution generally occurs late in
the shaping process.
Uniscribe supports the `pstf` behavior by handling the decomposition
of multi-part dependent vowels differently for Sinhala -- in a sense,
decomposing each matra into its left-side component followed by a
duplicate of the original matra, then substituting the duplicated
matra with the right-side matra component when the `pstf` feature is
applied.
Shaping engines may opt to decompose multi-part dependent
vowels into their canonical Unicode decompositions, as is done in
other scripts, and substitute the decomposed right-side matra
components at that point.
Doing so will negate the need to apply the `pstf` substitution.
However, fonts that were engineered to support the
Uniscribe-supported behavior might not include GPOS positioning
rules for the right-side matra components, relying instead on the
`pstf` substitution to provide a suitable replacement.
## Miscellaneous ##
### Bengali `init` feature matching ###
The `init` feature in Bengali is defined in the OpenType specification
as applying to word-initial left-side dependent vowels (matras).
However, Uniscribe specifically applies the feature whenever
the matra is preceded by any character that falls within the following
range in the Unicode `General Category` property:
- `GENERAL_CATEGORY_FORMAT` [Cf]
- `GENERAL_CATEGORY_UNASSIGNED` [Cn]
- `GENERAL_CATEGORY_PRIVATE_USE` [Co]
- `GENERAL_CATEGORY_SURROGATE` [Cs]
- `GENERAL_CATEGORY_LOWERCASE_LETTER` [Ll]
- `GENERAL_CATEGORY_MODIFIER_LETTER` [Lm]
- `GENERAL_CATEGORY_OTHER_LETTER` [Lo]
- `GENERAL_CATEGORY_TITLECASE_LETTER` [Lt]
- `GENERAL_CATEGORY_UPPERCASE_LETTER` [Lu]
- `GENERAL_CATEGORY_SPACING_MARK` [Mc]
- `GENERAL_CATEGORY_ENCLOSING_MARK` [Me]
- `GENERAL_CATEGORY_NON_SPACING_MARK` [Mn]
### Old-model post-base Halant reordering ###
In old-model (Indic1) script tags, Uniscribe treats some
scripts differently when reordering the first post-base "Halant". This
Halant-reordering is done in Indic1 scripts in order to prepare the
syllable for Indic1's different post-base GSUB substitution rules.
For example, the old-model Indic syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In Malayalam, Uniscribe always reorders the first post-base "Halant" in
a syllable to the position immediately after the syllable's last consonant.
#### Kannada final double Halants ####
In old-model Kannada (``) runs, Uniscribe is known to reorder
the first post-base "Halant" only when there is not already a "Halant"
after the last consonant.
For example, the old-model Indic syllable
Pre-baseC Halant BaseC Halant Post-baseC Halant
would _not_ be reordered.
This behavior is an exception to the general Indic1 post-base "Halant"
reordering operation. It is believed to be script-specific and has
only been observed for Kannada text runs. However, there may still be
undiscovered sequences in other Indic1-script text which trigger the
same behavior; implementers targeting full compatibility should
exercise caution.
If the standard post-base "Halant" reordering were performed, then the
likely result of the GSUB feature-application phase would be a
sequence of the form "BaseC,belowbaseC,Halant" which, in turn, might
trigger mark-attachment issues for correctly positioning the final
"Halant".
This Uniscribe behavior is not documented, however; therefore the only
recommended workaround for maintaining compatibility is to define a
special-case exception for avoiding the creation of final double
"Halant"s in `` text.
### Halants and left matras ###
When reordering left-side matras, when a "Halant" occurs immediately
after a left-side matra, Uniscribe does not move the "Halant" with the matra.
Generally, marks (including "Halant") are tagged for reordering with
the same positioning tag as the closest non-mark character that the
mark has affinity with.
In post-base position, where a yet-to-be-reordered left-side matra
would be found, the closest non-mark character with affinity for the
mark might be a post-base consonant. Uniscribe appears to make a check
ensuring that the "Halant" after a left-side matra is not tagged for
reordering with the matra.
This check is required for shaping Sinhala, because the `U+0DDA`
multi-part matra decomposes into the sequence "`U+0DD9`,Halant". The
decomposed "Halant" should remain where it is, serving as the right-side
matra component.
### Explicit half-forms followed by matras ###
As a general rule, Uniscribe and other shapers insert a dotted-circle
character before a non-spacing mark character (such as a matra in
Indic2-model scripts) when that non-spacing mark character is not
matched with a base character in a permitted syllable. In such
circumstances, the dotted-circle visually serves to communicate to
readers that a base character has not been found, and also
functionally serves as a surrogate base on which the mark character
can be positioned.
However, Uniscribe is known not to insert a dotted-circle before a
matra character when it is preceded by two sequential
explicit-half-form sequences (meaning two consecutive occurrences of
"_Consonant_,Halant,ZWJ") in Indic2 runs.
Therefore, the sequence:
`_Consonant_,Halant,ZWJ,_matra_`
would be transformed to:
`_Consonant_,Halant,ZWJ,Dotted-Circle,_matra_`
but the sequence:
`_Consonant_,Halant,ZWJ,_Consonant_,Halant,ZWJ,_matra_`
would _not_ be transformed with a dotted-circle insertion.
This exception is regarded as a likely bug.
================================================
FILE: opentype-shaping-arabic-general.md
================================================
# Arabic-style shaping in OpenType #
This document details the general shaping procedure shared by Arabic, N'Ko,
Syriac, and Mongolian.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Joining properties](#joining-properties)
- [Mark classification](#mark-classification)
- [Character tables](#character-tables)
- [The general Arabic-based shaping model](#the-general-arabic-based-shaping-model)
- [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)
- [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)
- [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)
- [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)
- [Stage 5: Applying the language-form substitution features from GSUB](#stage-5-applying-the-language-form-substitution-features-from-gsub)
- [Stage 6: Applying the typographic-form substitution features from GSUB](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)
- [Stage 7: Applying the positioning features from GPOS](#stage-7-applying-the-positioning-features-from-gpos)
## General information ##
Several scripts can be supported by the general OpenType shaping model
used for Arabic. These writing systems observe similar rules and
conventions, even if they are not historically related to
Arabic. Therefore, OpenType defines many of the same GSUB and GPOS
features as supported for the corresponding script tags. These scripts include:
- [Arabic](opentype-shaping-arabic.md)
- [N'Ko](opentype-shaping-nko.md)
- [Syriac](opentype-shaping-syriac.md)
- [Mongolian](opentype-shaping-mongolian.md)
The information found below is intended to serve as a general guide;
script-specific information can be found in the linked document for
each script.
Each of these writing systems uses a joining script that uses
inter-word spaces. Therefore, each codepoint in a text run may be
substituted with one of several contextual forms corresponding to
what, if any, characters appear before and after the codepoint. Most,
but not all, letter sequences join; shaping engines must track which
positions trigger joining behavior for each letter.
Arabic, N'Ko, and Syriac are written (and, therefore, rendered) from right to
left. Mongolian is written vertically, from top to bottom. Shaping
engines must track the directionality of the text run when scripts of
different direction are mixed.
## Terminology ##
OpenType shaping uses a standard set of terms for elements of the
supported scripts. The terms used colloquially in any particular language
may vary, however, potentially causing confusion.
**Base** glyph or character is the standard term for a
character that is capable of taking a diacritical mark.
**Kashida** (or **tatweel**) is the term for a glyph inserted into a
sequence for the purpose of elongating the baseline stroke of a
letter. Unicode documents use the term "tatweel" most frequently,
while OpenType documents use the term "kashida" most
frequently. Kashidas are typically inserted in order to justify lines
of text.
## Glyph classification ##
In joining (or cursive) scripts, proper shaping of
text runs involves identifying the joining behavior of each character,
then combining that information with any preceding or subsequent
characters to determine the contextually correct form for display.
### Joining properties ###
Characters are assigned a `JOINING_TYPE` property in the
Unicode standard that indicates how they join to adjacent
characters. There are six possible values:
- `JOINING_TYPE_LEFT` indicates that a character joins with
the subsequent character, but does not join with the preceding
character.
- `JOINING_TYPE_RIGHT` indicates that a character joins with the
preceding character, but does not join with the subsequent character.
- `JOINING_TYPE_DUAL` indicates that a character joins with the
preceding character and joins with the subsequent character.
- `JOINING_TYPE_NON_JOINING` indicates that a character does not
join with the preceding or with the subsequent character.
- `JOINING_TYPE_TRANSPARENT` indicates that the character does not
join with adjacent characters _and_ that the character must be
skipped over when the shaping engine is evaluating the joining
positions in a sequence of characters. When a
`JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,
the `JOINING_TYPE` of the preceding character passes
through. Diacritical marks are frequently assigned this value.
- `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces
the use of joining forms with the preceding and subsequent
characters. Kashidas and the Zero Width Joiner (`U+200D`) are both
`JOIN_CAUSING` characters.
In some scripts (such as Arabic and Syriac), letters are also assigned
to a `JOINING_GROUP` that indicates which fundamental character they
behave like with regard to joining behavior. Each of the basic letters
in the script typically belongs to its own `JOINING_GROUP`, while
supplemental and accented letters are usually assigned to the
`JOINING_GROUP` that corresponds to the underlying base letter, with
no diacritics or other marks.
For example, the Persian letter "Peh" (`U+067E`) is visually
represented as the Arabic letter "Beh" (`U+0628`), but with two additional
below-base "ijam" marks. Consequently, "Peh" is assigned to the `BEH`
`JOINING_GROUP`.
Mongolian and N'Ko, notably, do not make use of joining groups. Every
letter in these scripts belongs to the _null_ or `NO_JOINING_GROUP`
group.
### Mark classification ###
The Unicode standard defines a _canonical combining class_ for each
codepoint that is used whenever a sequence needs to be sorted into
canonical order.
The marks in most scripts belong to the standard combining
classes. For example:
:::{table} Example mark-classification table
| Codepoint | Combining class | Glyph |
|:----------|:----------------|:-----------------------------------|
|`U+064B` | 27 | ً Fathatan / Open fathatan |
|`U+064C` | 28 | ٌ Dammatan / Open dammatan |
|`U+064D` | 29 | ٍ Kasratan / Open Kasratan |
|`U+064E` | 30 | َ Fatha / Small fatha |
|`U+064F` | 31 | ُ Damma / Small damma |
|`U+0650` | 32 | ِ Kasra / Small kasra |
|`U+0651` | 33 | ّ Shadda |
|`U+0652` | 34 | ْ Sukun |
|`U+0670` | 35 | ٰ Superscript Alef |
| | 220 | Other below-base combining marks |
| | 230 | Other above-base combining marks |
:::
The numeric values of these combining classes are used during Unicode
normalization. Sequences of marks are sorted by combining class,
reordering the sequence into increasing numerical order.
In addition, some Arabic and Syriac marks require special handling
when shaping Arabic text, during the mark-reordering stage. These
marks fall into two classes of _Modifier Combining Marks_ (MCM) that
may need to be repositioned closer to the base character, when they
occur in sequences of multiple marks.
The sets are:
- Below-base (class 220) MCMs
- Above-base (class 230) MCMs
These classifications are used in the [mark-transient-reordering
stage](#stage-1-transient-reordering-of-modifier-combining-marks).
Lists of the marks that belong to each MCM classes are included in the
script-specific shaping documents for Arabic and Syriac.
### Character tables ###
Character tables for all of the scripts, plus important miscellaneous
characters, are available here:
- [Arabic](character-tables/character-tables-arabic.md#arabic-character-table)
- [Syriac](character-tables/character-tables-syriac.md#syriac-character-table)
- [N'Ko](character-tables/character-tables-nko.md#nko-character-table)
- [Mongolian](character-tables/character-tables-mongolian.md#mongolian-character-table)
## The general Arabic-based shaping model ##
Processing a run of text tagged with any of the scripts supported by
the general Arabic shaping model involves seven top-level stages:
1. Transient reordering of modifier combining marks
2. Compound character composition and decomposition
3. Computing letter joining states
4. Applying the `stch` feature
5. Applying the language-form substitution features from GSUB
6. Applying the typographic-form substitution features from GSUB
7. Applying the positioning features from GPOS
### Stage 1: Transient reordering of modifier combining marks ###
> Note: the transient reordering of modifier combining marks is
> necessary only for scripts that can feature the "Shadda" mark or
> marks that belong to _Modifier Combining Marks_ (MCM) classes.
Sequences of adjacent marks must be reordered so that they appear in
the appropriate visual order before the mark-to-base and mark-to-mark
positioning features from GPOS can be correctly applied.
In particular, those marks that have strong affinity to the base
character must be placed closest to the base.
This mark-reordering operation is distinct from the standard,
cross-script mark-reordering performed during Unicode
normalization. The standard Unicode mark-reordering algorithm is based
on comparing the _Canonical_Combining_Class_ (Ccc) properties of mark
codepoints, whereas this script-specific reordering utilizes the
_Modifier_Combining_Mark_ (MCM) subclasses specified in the
character tables.
The algorithm for reordering a sequence of marks is:
- First, move any "Shadda" (combining class `33`) characters to the
beginning of the mark sequence.
- Second, move any subsequence of combining-class-`230` characters that begins
with a `230_MCM` character to the beginning of the sequence,
before all "Shadda" characters. The subsequence must be moved
as a group.
- Finally, move any subsequence of combining-class-`220` characters that begins
with a `220_MCM` character to the beginning of the sequence,
before all "Shadda" characters and before all class-`230`
characters. The subsequence must be moved as a group.
> Note: Unicode describes this mark-reordering operation, the Arabic
> Mark Transient Reordering Algorithm (AMTRA), in Technical Report 53,
> which describes it in terms that are distinct from standard,
> Ccc-based mark reordering.
>
> Specifically, AMTRA is designated as an operation performed during
> text rendering only, which therefore does not impact other
> Unicode-compliance issues such as allowable input sequences or text
> encoding.
>
> However, shaping engines may choose to perform the reordering of
> modifier combining marks in conjunction with their Unicode
> normalization functionality for increased efficiency.
### Stage 2: Compound character composition and decomposition ###
The `ccmp` feature allows a font to substitute
- mark-and-base sequences with a pre-composed glyph including both
the mark and the base (as is done in with a ligature substitution)
- individual compound glyphs with the equivalent sequence of
decomposed glyphs (such as decomposing a letter with ijam into a
separate fundamental-letter glyph followed by an ijam-only glyph,
to permit more precise positioning)
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB or GPOS lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
### Stage 3: Computing letter joining states ###
In order to correctly apply the initial, medial, and final form
substitutions from GSUB during stage 6, the shaping engine must
tag every letter for possible application of the appropriate feature.
> Note: not all of the rules detailed below apply to every script that
> is supported by the general Arabic shaping model.
To determine which feature is appropriate, the shaping engine must
examine each word in turn and compute each letter's joining state from
the letter's `JOINING_TYPE` and the `JOINING_TYPE` of the
preceding character (if any).
> Note: Although the supported scripts use inter-word spaces, the
> `init` feature does _not_ refer to word-initial letters only and the
> `fina` feature does _not_ refer to word-final letters only.
>
> Rather, both of these terms are defined with respect to whether or
> not the preceding and subsequent letters form joins with the current
> letter. The letters at word boundaries will, naturally, take on
> initial and final forms, but initial and final forms of letters also
> occur regularly within words, when the letter in question is
> adjacent to a letter that does not form joins.
This computation starts from the first letter of the word, temporarily
tagging the letter for `isol` substitution. If the first
letter is the only letter in the word, the `isol` tag will remain unchanged.
From here, the algorithm consumes each character in the string, one at
a time, keeping track of the JOINING_TYPE of the previous character.
If the current character is JOINING_TYPE_TRANSPARENT, move on to the next
character but preserve the currently-tracked JOINING_TYPE at its previous state.
If the preceding character's JOINING_TYPE is LEFT, DUAL, or
JOIN_CAUSING:
- In `` text, if the current character is "Alaph", tag the
current character for `med2`, then update the tag for the
preceding character:
- `isol` becomes `init`
- `fina` becomes `medi`
- `init` remains `init`
- `medi` remains `medi`
- If the current character's JOINING_TYPE is RIGHT, DUAL, or
JOIN_CAUSING, tag the current character for `fina`, then update
the tag for the preceding character:
- `isol` becomes `init`
- `fina` becomes `medi`
- `init` remains `init`
- `medi` remains `medi`
Otherwise, tag the current character for `isol`.
After testing the final character of the word, if the text is in `` and
if the last character that is not JOINING_TYPE_TRANSPARENT or
JOINING_TYPE_NON_JOINING is "Alaph", perform an additional test:
- If the preceding character is JOINING_TYPE_LEFT, tag the current character
for `fina`
- If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current
character for `fin3`
- Otherwise, tag the current character for `fin2`
Once the last character of the word has been processed, proceed to the
next word and repeat the algorithm, starting at the beginning of the
next word.
> Note: Because the processing of the characters in the algorithm
> described above is deterministic, shaping engines may choose to
> implement the joining-state computation as a state machine, in a lookup
> table, or by any other means desirable.
At the end of this process, all letters should be tagged for possible
substitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or
`fin3` features.
### Stage 4: Applying the `stch` feature ###
The `stch` feature decomposes and stretches special marks that are
meant to extend to the full width of words to which they are
attached. It was defined for use in `` text runs for the "Syriac
Abbreviation Mark" (`U+070F`) but it can be used with similar marks in
other scripts.
To apply the `stch` feature, the shaping engine should first decompose the
`U+070F` glyph into components, which results in a beginning point glyph,
midpoint glyph, and endpoint glyph plus one (or more) extension glyphs: at
least one extension between the beginning and midpoint glyphs and at
least one extension between the midpoint and endpoint glyphs.
The shaping engine must then calculate the total length of the word to
which the mark applies. That length, minus the advance widths of the
beginning, middle, and endpoint glyphs of the mark, must be divided by
two.
The result, divided by the advance width of the extension glyph
and rounded up to the next integer, tells the shaping engine how many
copies of the extension glyph must be placed between the midpoint and
each end of the mark.
Following this procedure ensures that the same number of extensions is
used on each side of the mark so that it remains symmetrical.
Finally, the decomposed mark must be reordered as follows:
- All of the glyphs in the sequence for the mark, _except_ for
the final glyph, are repositioned as a group so that they precede
the word to which the mark is attached.
- The final glyph in the mark sequence is repositioned to the end of
the word.
### Stage 5: Applying the language-form substitution features from GSUB ###
The language-substitution phase applies mandatory substitution
features using the rules in the font's GSUB table. In preparation for
this stage, glyph sequences should be tagged for possible application
of GSUB features.
The order in which these substitutions must be performed is fixed for
all scripts implemented with the Arabic shaping model:
locl
isol
fina
fin2 (only used in Syriac)
fin3 (only used in Syriac)
medi
med2 (only used in Syriac)
init
rlig
rclt
calt
> Note: `rlig` and `calt` need to be appled to the word as a whole before
> continuing to the next feature.
See the individual script pages for further detail on each feature and
for script-specific information.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
### Stage 6: Applying the typographic-form substitution features from GSUB ###
The typographic-substitution phase applies optional substitution
features using the rules in the font's GSUB table.
The order in which these substitution must be performed is fixed for
all scripts implemented in the Arabic shaping model:
liga
dlig
cswh
mset
See the individual script pages for further detail on each feature and
for script-specific information.
### Stage 7: Applying the positioning features from GPOS ###
The positioning stage adjusts the positions of mark and base
glyphs.
The order in which these features are applied is fixed for
all scripts implemented in the Arabic shaping model:
curs
kern
mark
mkmk
See the individual script pages for further detail on each feature and
for script-specific information.
================================================
FILE: opentype-shaping-arabic.md
================================================
# Arabic script shaping in OpenType #
This document details the general shaping procedure shared by all
Arabic script styles, and defines the common pieces that style-specific
implementations share.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Joining properties](#joining-properties)
- [Mark classification](#mark-classification)
- [Character tables](#character-tables)
- [The `` shaping model](#the-arab-shaping-model)
- [Stage 1: Transient reordering of modifier combining marks](#stage-1-transient-reordering-of-modifier-combining-marks)
- [Stage 2: Compound character composition and decomposition](#stage-2-compound-character-composition-and-decomposition)
- [Stage 3: Computing letter joining states](#stage-3-computing-letter-joining-states)
- [Stage 4: Applying the `stch` feature](#stage-4-applying-the-stch-feature)
- [Stage 5: Applying the language-form substitution features from GSUB](#stage-5-applying-the-language-form-substitution-features-from-gsub)
- [Stage 6: Applying the typographic-form substitution features from GSUB](#stage-6-applying-the-typographic-form-substitution-features-from-gsub)
- [Stage 7: Applying the positioning features from GPOS](#stage-7-applying-the-positioning-features-from-gpos)
## General information ##
The Arabic script is used to write multiple languages, most commonly
Arabic, Persian, Urdu, Pashto, Kurdish, and Azerbaijani.
The Arabic script encompasses multiple distinct styles, including Naskh,
Nataliq, and Kufi, that share a number of common features and rules,
but that differ considerably in their final appearance. Due to the
common features found between the styles, a shaping engine can support
all styles of Arabic with a single shaping model.
In addition, several other writing systems that observe similar rules
and conventions can be supported using the same shaping model, even if
they are not historically related to Arabic. These scripts include:
- [N'Ko](opentype-shaping-nko.md)
- [Syriac](opentype-shaping-syriac.md)
- [Mongolian](opentype-shaping-mongolian.md)
Note that each of these scripts has its own independent
script tag defined in OpenType. N'Ko uses ``, Syriac uses ``, and
Mongolian uses ``. The information found below about the ``
script shaping model can serve as a general guide; script-specific
information can be found in the linked document for each script.
Arabic is a joining script that uses inter-word spaces, so each
codepoint in a text run may be substituted with one of several
contextual forms corresponding to what, if any, characters appear
before and after the codepoint. Most, but not all, letter sequences
join; shaping engines must track which positions trigger joining
behavior for each letter.
Arabic is written (and, therefore, rendered) from right to
left. Shaping engines must track the directionality of the text run
when scripts of different direction are mixed.
## Terminology ##
OpenType shaping uses a standard set of terms for elements of the
Arabic script. The terms used colloquially in any particular language
may vary, however, potentially causing confusion.
**Base** glyph or character is the standard term for a Arabic
character that is capable of taking a diacritical mark.
Most of the base characters in Arabic are consonants, but each
language written with the Arabic script may have one or more vowel
base letters.
Vowels that are not base characters are frequently omitted from the
text run entirely. Alternatively, such a vowel may appear as a
diacritical mark called a **ḥarakah**.
**Ijam** is the standard term for an above- or below-base dot that
distinguishes one consonant from another. Ijam are not considered
diacritics; they are integral to the consonant of which they are a part.
**Shadda** and **tashdid** are both standard terms for the "consonant
doubling" diacritical mark.
**Hamza** is the standard term for the glottal stop
semi-consonant. The hamza is not regarded as a full letter in most
languages, although it can appear as a standalone letter within
words. In some sequences, the hamza attaches to an adjacent letter;
when a hamza-supporting letter is not adjacent, however, the hamza can
appear on its own.
**Kashida** (or **tatweel**) is the term for a glyph inserted into a
sequence for the purpose of elongating the baseline stroke of a
letter. Unicode documents use the term "tatweel" most frequently,
while OpenType documents use the term "kashida" most
frequently. Kashidas are typically inserted in order to justify lines
of text.
## Glyph classification ##
Because Arabic is a joining (or cursive) script, proper shaping of
text runs involves identifying the joining behavior of each character,
then combining that information with any preceding or subsequent
characters to determine the contextually correct form for display.
### Joining properties ###
Arabic characters are assigned a `JOINING_TYPE` property in the
Unicode standard that indicates how they join to adjacent
characters. There are six possible values:
- `JOINING_TYPE_LEFT` indicates that a character joins with
the subsequent character, but does not join with the preceding
character.
- `JOINING_TYPE_RIGHT` indicates that a character joins with the
preceding character, but does not join with the subsequent character.
- `JOINING_TYPE_DUAL` indicates that a character joins with the
preceding character and joins with the subsequent character.
- `JOINING_TYPE_NON_JOINING` indicates that a character does not
join with the preceding or with the subsequent character.
- `JOINING_TYPE_TRANSPARENT` indicates that the character does not
join with adjacent characters _and_ that the character must be
skipped over when the shaping engine is evaluating the joining
positions in a sequence of characters. When a
`JOINING_TYPE_TRANSPARENT` character is encountered in a sequence,
the `JOINING_TYPE` of the preceding character passes
through. Diacritical marks are frequently assigned this value.
- `JOINING_TYPE_JOIN_CAUSING` indicates that the character forces
the use of joining forms with the preceding and subsequent
characters. Kashidas and the Zero Width Joiner (`U+200D`) are both
`JOIN_CAUSING` characters.
Arabic letters are also assigned to a `JOINING_GROUP` that indicates
which fundamental character they behave like with regard to joining
behavior. Each of the basic letters in the Arabic block tends to
belong to its own `JOINING_GROUP`, while letters from the supplemental and
extended blocks are usually assigned to the `JOINING_GROUP` that
corresponds to the character's base letter, with no diacritics or ijam.
For example, the Persian letter "Peh" (`U+067E`) is visually
represented as the Arabic letter "Beh" (`U+0628`), but with two additional
below-base ijam. Consequently, "Peh" is assigned to the `BEH` `JOINING_GROUP`.
### Mark classification ###
The Unicode standard defines a _canonical combining class_ for each
codepoint that is used whenever a sequence needs to be sorted into
canonical order.
Several of the Arabic marks belong to standard combining
classes:
:::{table} Mark-classification table
| Codepoint | Combining class | Glyph |
|:----------|:----------------|:-----------------------------------|
|`U+064B` | 27 | ً Fathatan / Open fathatan |
|`U+064C` | 28 | ٌ Dammatan / Open dammatan |
|`U+064D` | 29 | ٍ Kasratan / Open Kasratan |
|`U+064E` | 30 | َ Fatha / Small fatha |
|`U+064F` | 31 | ُ Damma / Small damma |
|`U+0650` | 32 | ِ Kasra / Small kasra |
|`U+0651` | 33 | ّ Shadda |
|`U+0652` | 34 | ْ Sukun |
|`U+0670` | 35 | ٰ Superscript Alef |
| | 220 | Other below-base combining marks |
| | 230 | Other above-base combining marks |
:::
The numeric values of these combining classes are used during Unicode
normalization.
A subset of the Arabic marks require special handling when shaping
Arabic text, during the mark-reordering stage. These include two sets
of _Modifier Combining Marks_ (MCM) that may need to be repositioned
closer to the base character, when they occur in sequences of multiple
marks.
The sets are:
- Below-base (class 220) MCMs: "Hamza below" (`U+0655`), "Small low seen"
(`U+06E3`), "Large round dot below" (`U+08CF`), "Small low waw" (`U+08D3`)
- Above-base (class 230) MCMs: "Hamza above" (`U+0654`), "Mark noon ghunna"
(`U+0658`), "Small high seen" (`U+06DC`), "Small high yeh" (`U+06E7`), "Small high
noon" (`U+06E8`), "Small high Farsi yeh" (`U+08CA`), "Small high
yeh barree with two dots below" (`U+08CB`), "Small high zah"
(`U+08CD`), "Large round dot above" (`U+08CE`), "Small high waw" (`U+08F3`)
These classifications are used in the [mark-transient-reordering
stage](#stage-1-transient-reordering-of-modifier-combining-marks).
### Character tables ###
Separate character tables are provided for the Arabic, Arabic
Supplement, Arabic Extended-A, Abaric Extended-B, and Rumi Numeral
Symbols blocks, as well as for other miscellaneous characters that are
used in `` text runs:
- [Arabic character table](character-tables/character-tables-arabic.md#arabic-character-table)
- [Arabic Supplement character table](character-tables/character-tables-arabic.md#arabic-supplement-character-table)
- [Arabic Extended-A character table](character-tables/character-tables-arabic.md#arabic-extended-a-character-table)
- [Arabic Extended-B character table](character-tables/character-tables-arabic.md#arabic-extended-b-character-table)
- [Arabic Extended-C character table](character-tables/character-tables-arabic.md#arabic-extended-c-character-table)
- [Rumi Numeral Symbols character table](character-tables/character-tables-arabic.md#rumi-numeral-symbols-character-table)
- [Miscellaneous character table](character-tables/character-tables-arabic.md#miscellaneous-character-table)
Unicode also defines two blocks that implement backward compatibility
with retired file-encoding formats:
- Arabic Presentation Forms-A
- Arabic Presentation Forms-B
Unless a software application is required to support specific stores of
documents that are known to have used these older encodings, however, the
shaping engine should not be expected to handle any text runs
incorporating codepoints from these blocks.
The tables list each codepoint along with its Unicode general
category and its joining type. For letters, the table lists the
codepoint's joining group. For diacritical marks, the table lists the
codepoint's mark combining class. The codepoint's Unicode name and an example
glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
|:----------|:-----------------|:-------------|:--------------|:-----------|:-----------------------------|
|`U+0628` | Letter | DUAL | BEH | _null_ | ب Beh |
| | | | | |
|`U+0655` | Mark [Mn] | TRANSPARENT | _null_ | 220_MCM | ٕ Hamza Below |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Arabic text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
combining mark in isolation. Real-world text documents may also use
other characters, such as hyphens or dashes, in a similar placeholder
fashion; shaping engines should cope with this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (which is not common in Arabic but can occur in
other scripts)
The combining grapheme joiner (CGJ) is primarily used to alter the
order in which adjacent marks are positioned during the
mark-reordering stage, in order to adhere to the needs of a
non-default language orthography.
By default, OpenType shaping reorders sequences of adjacent marks by
sorting the sequence on the marks' Canonical_Combining_Class (Ccc)
values. The presence of a CGJ character within a sequence of marks has
the effect of splitting the sequence into two sequences of marks and,
therefore, halting any mark-reordering that would have occurred
between the marks on either side of the CGJ.
The zero-width joiner (ZWJ) is primarily used to force the usage of the
cursive connecting form of a letter even when the context of the
adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such
as for displaying it in a table of forms), the sequence "_Letter_,ZWJ"
would be used. To show the medial form of a letter in isolation, the
sequence "ZWJ,_Letter_,ZWJ" would be used.
The zero-width non-joiner (ZWNJ) is primarily used to prevent a
cursive connection between two adjacent characters that would, under
normal circumstances, form a join.
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by
the Unicode bidirectionality algorithm (BiDi) to indicate the points
in a text run at which the writing direction changes. Generally
speaking RLM and LRM codepoints do not interact with shaping.
The no-break space is primarily used to display those codepoints that
are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
## The `` shaping model ##
Processing a run of `` text involves seven top-level stages:
1. Transient reordering of modifier combining marks
2. Compound character composition and decomposition
3. Computing letter joining states
4. Applying the `stch` feature
5. Applying the language-form substitution features from GSUB
6. Applying the typographic-form substitution features from GSUB
7. Applying the positioning features from GPOS
### Stage 1: Transient reordering of modifier combining marks ###
Sequences of adjacent marks must be reordered so that they appear in
the appropriate visual order before the mark-to-base and mark-to-mark
positioning features from GPOS can be correctly applied.
In particular, those marks that have strong affinity to the base
character must be placed closest to the base.
This mark-reordering operation is distinct from the standard,
cross-script mark-reordering performed during Unicode
normalization. The standard Unicode mark-reordering algorithm is based
on comparing the _Canonical_Combining_Class_ (Ccc) properties of mark
codepoints, whereas this script-specific reordering utilizes the
_Modifier_Combining_Mark_ (MCM) subclasses specified in the
character tables.
The algorithm for reordering a sequence of marks is:
- First, move any "Shadda" (combining class `33`) characters to the
beginning of the mark sequence.
- Second, move any subsequence of combining-class-`230` characters that begins
with a `230_MCM` character to the beginning of the sequence,
before all "Shadda" characters. The subsequence must be moved
as a group.
- Finally, move any subsequence of combining-class-`220` characters that begins
with a `220_MCM` character to the beginning of the sequence,
before all "Shadda" characters and before all class-`230`
characters. The subsequence must be moved as a group.
> Note: Unicode describes this mark-reordering operation, the Arabic
> Mark Transient Reordering Algorithm (AMTRA), in Unicode
> Standard Annex 53, which describes it in terms that are distinct
> from standard, Ccc-based mark reordering.
>
> Specifically, AMTRA is designated as an operation performed during
> text rendering only, which therefore does not impact other
> Unicode-compliance issues such as allowable input sequences or text
> encoding.
>
> However, shaping engines may choose to perform the reordering of
> modifier combining marks in conjunction with their Unicode
> normalization functionality for increased efficiency.
### Stage 2: Compound character composition and decomposition ###
The `ccmp` feature allows a font to substitute
- mark-and-base sequences with a pre-composed glyph including both
the mark and the base (as is done in with a ligature substitution)
- individual compound glyphs with the equivalent sequence of
decomposed glyphs (such as decomposing a letter with ijam into a
separate fundamental-letter glyph followed by an ijam-only glyph,
to permit more precise positioning)
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB or GPOS lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-ccmp}
Composition and decomposition
:::
```{svg-color-toggle-button} arabic-ccmp
```
### Stage 3: Computing letter joining states ###
In order to correctly apply the initial, medial, and final form
substitutions from GSUB during stage 6, the shaping engine must
tag every letter for possible application of the appropriate feature.
> Note: The following algorithm includes rules for processing ``
> text in addition to `` text. Implementers concerned only with
> shaping `` text can omit the portions for ``-specific
> rules.
To determine which feature is appropriate, the shaping engine must
examine each word in turn and compute each letter's joining state from
the letter's `JOINING_TYPE` and the `JOINING_TYPE` of the
preceding character (if any).
> Note: Although Arabic uses inter-word spaces, the `init` feature
> does _not_ refer to word-initial letters only and the `fina` feature
> does _not_ refer to word-final letters only.
>
> Rather, both of these terms are defined with respect to whether or
> not the preceding and subsequent letters form joins with the current
> letter. The letters at word boundaries will, naturally, take on
> initial and final forms, but initial and final forms of letters also
> occur regularly within words, when the letter in question is
> adjacent to a letter than does not form joins.
This computation starts from the first letter of the word, temporarily
tagging the letter for `isol` substitution. If the first
letter is the only letter in the word, the `isol` tag will remain unchanged.
From here, the algorithm consumes each character in the string, one at
a time, keeping track of the JOINING_TYPE of the previous character.
If the current character is JOINING_TYPE_TRANSPARENT, move on to the next
character but preserve the currently-tracked JOINING_TYPE at its previous state.
If the preceding character's JOINING_TYPE is LEFT, DUAL, or
JOIN_CAUSING:
- In `` text, if the current character is "Alaph", tag the
current character for `med2`, then update the tag for the
preceding character:
- `isol` becomes `init`
- `fina` becomes `medi`
- `init` remains `init`
- `medi` remains `medi`
- If the current character's JOINING_TYPE is RIGHT, DUAL, or
JOIN_CAUSING, tag the current character for `fina`, then update
the tag for the preceding character:
- `isol` becomes `init`
- `fina` becomes `medi`
- `init` remains `init`
- `medi` remains `medi`
Otherwise, tag the current character for `isol`.
After testing the final character of the word, if the text is in `` and
if the last character that is not JOINING_TYPE_TRANSPARENT or
JOINING_TYPE_NON_JOINING is "Alaph", perform an additional test:
- If the preceding character is JOINING_TYPE_LEFT, tag the current character
for `fina`
- If the preceding character's JOINING_GROUP is DALATH_RISH, tag the current
character for `fin3`
- Otherwise, tag the current character for `fin2`
Once the last character of the word has been processed, proceed to the
next word and repeat the algorithm, starting at the beginning of the
next word.
> Note: Because the processing of the characters in the algorithm
> described above is deterministic, shaping engines may choose to
> implement the joining-state computation as a state machine, in a lookup
> table, or by any other means desirable.
At the end of this process, all letters should be tagged for possible
substitution by one of the `isol`, `init`, `medi`, `med2`, `fina`, `fin2`, or
`fin3` features.
### Stage 4: Applying the `stch` feature ###
The `stch` feature decomposes and stretches special marks that are
meant to extend to the full width of words to which they are
attached. It was defined for use in `` text runs for the "Syriac
Abbreviation Mark" (`U+070F`) but it can be used with similar marks in
other scripts.
To apply the `stch` feature, the shaping engine should first decompose the
`U+070F` glyph into components, which results in a beginning point,
midpoint, and endpoint glyphs plus one (or more) extension glyphs: at
least one extension between the beginning and midpoint glyphs and at
least one extension between the midpoint and endpoint glyphs.
The shaping engine must then calculate the total length of the word to
which the mark applies. That length, minus the advance widths of the
beginning, middle, and endpoint glyphs of the mark, must be divided by
two.
The result, divided by the advance width of the extension glyph
and rounded up to the next integer, tells the shaping engine how many
copies of the extension glyph must be placed between the midpoint and
each end of the mark.
Following this procedure ensures that the same number of extensions is
used on each side of the mark so that it remains symmetrical.
Finally, the decomposed mark must be reordered as follows:
- All of the glyphs in the sequence for the mark, _except_ for
the final glyph, are repositioned as a group so that they precede
the word to which the mark is attached.
- The final glyph in the mark sequence is repositioned to the end of
the word.
### Stage 5: Applying the language-form substitution features from GSUB ###
The language-substitution phase applies mandatory substitution
features using the rules in the font's GSUB table. In preparation for
this stage, glyph sequences should be tagged for possible application
of GSUB features.
The order in which these substitutions must be performed is fixed for
all scripts implemented in the Arabic shaping model:
locl
isol
fina
fin2 (not used in )
fin3 (not used in )
medi
med2 (not used in )
init
rlig
rclt
calt
> Note: `rlig` and `calt` need to be appled to the word as a whole before
> continuing to the next feature.
#### Stage 5, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-locl}
Localized form substitution
:::
```{svg-color-toggle-button} arabic-locl
```
#### Stage 5, step 2: isol ####
The `isol` feature substitutes the default glyph for a codepoint with
the isolated form of the letter.
> Note: It is common for a font to use the isolated form of a letter
> as the default, in which case the `isol` feature would apply no
> substitutions. However, this is only a convention, and the active
> font may use other forms as the default glyphs for any or all
> codepoints.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-isol}
Isolated form substitution
:::
```{svg-color-toggle-button} arabic-isol
```
#### Stage 5, step 3: fina ####
The `fina` feature substitutes the default glyph for a codepoint with
the terminal (or final) form of the letter.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-fina}
Final form substitution
:::
```{svg-color-toggle-button} arabic-fina
```
#### Stage 5, step 4: fin2 ####
This feature is not used in `` text.
#### Stage 5, step 5: fin3 ####
This feature is not used in `` text.
#### Stage 5, step 6: medi ####
The `medi` feature substitutes the default glyph for a codepoint with
the medial form of the letter.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-medi}
Medial form substitution
:::
```{svg-color-toggle-button} arabic-medi
```
#### Stage 5, step 7: med2 ####
This feature is not used in `` text.
#### Stage 5, step 8: init ####
The `init` feature substitutes the default glyph for a codepoint with
the initial form of the letter.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-init}
Initial form substitution
:::
```{svg-color-toggle-button} arabic-init
```
#### Stage 5, step 9: rlig ####
The `rlig` feature substitutes glyph sequences with mandatory
ligatures. Substitutions made by `rlig` cannot be disabled by
application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-rlig}
Required ligature substitution
:::
```{svg-color-toggle-button} arabic-rlig
```
#### Stage 5, step 10: rclt ####
The `rclt` feature substitutes glyphs with contextual alternate
forms. In general, this involves replacing the default form of a
connecting glyph with an alternate that provides a preferable
connection to an adjacent glyph.
The `rclt` feature should be used to perform such substitutions that
are required by the orthography of the active script and
language. Substitutions made by `rclt` cannot be disabled by
application-level user interfaces.
#### Stage 5, step 11: calt ####
The `calt` feature substitutes glyphs with contextual alternate
forms. In general, this involves replacing the default form of a
connecting glyph with an alternate that provides a preferable
connection to an adjacent glyph.
The `calt` feature, in contrast to `rclt` above, performs
substitutions that are not mandatory for orthographic
correctness. However, unlike `rclt`, the substitutions made by `calt`
can be disabled by application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-calt}
Contextual alternate substitution
:::
```{svg-color-toggle-button} arabic-calt
```
### Stage 6: Applying the typographic-form substitution features from GSUB ###
The typographic-substitution phase applies optional substitution
features using the rules in the font's GSUB table.
The order in which these substitutions must be performed is fixed for
all scripts implemented in the Arabic shaping model:
liga
dlig
cswh
mset
#### Stage 6, step 1: liga ####
The `liga` feature substitutes standard, optional ligatures that are on
by default. Substitutions made by `liga` may be disabled by
application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-liga}
Standard ligature substitution
:::
```{svg-color-toggle-button} arabic-liga
```
#### Stage 6, step 2: dlig ####
The `dlig` feature substitutes additional optional ligatures that are
off by default. Substitutions made by `dlig` may be disabled by
application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-dlig}
Discretionary ligature substitution
:::
```{svg-color-toggle-button} arabic-dlig
```
#### Stage 6, step 3: cswh ####
The `cswh` feature substitutes contextual swash variants of
glyphs. For example, the active font might substitute a longer variant
of "Noon" when a certain number of subsequent glyphs do not descend
below the baseline.
#### Stage 6, step 4: mset ####
The `mset` feature performs mark positioning by substituting sequences
of bases and marks with precomposed base-and-mark glyphs.
> Note: Positioning marks with the `mark` and `mkmk` features of GPOS is
> preferred, because `mset` can interfere with the OpenType shaping
> process. For example, substitution rules contained in `mset` may not be able to
> account for necessary mark-reordering adjustments conducted in the
> next stage.
>
> Nevertheless, when the active font uses `mset` substitutions, the
> shaping engine must deal with the situation gracefully.
### Stage 7: Applying the positioning features from GPOS ###
The positioning stage adjusts the positions of mark and base
glyphs.
The order in which these features are applied is fixed for
all scripts implemented in the Arabic shaping model:
curs
dist
kern
mark
mkmk
#### Stage 7, step 1: curs ####
The `curs` feature perform cursive positioning. Each glyph has an
entry point and exit point; the `curs` feature positions glyphs so
that the entry point of the current glyph meets the exit point of the
preceding glyph.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-curs}
Cursive positioning
:::
```{svg-color-toggle-button} arabic-curs
```
#### Stage 7, step 2: dist ####
The `dist` feature adjusts glyph spacing between glyphs. Unlike `kern`,
adjustments made with `dist` do not require the application or the user
to enable any software kerning features, if such features are
optional.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-dist}
Distance adjustment
:::
```{svg-color-toggle-button} arabic-dist
```
#### Stage 7, step 3: kern ####
The `kern` feature adjusts glyph spacing between pairs of adjacent glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-kern}
Kerning adjustment
:::
```{svg-color-toggle-button} arabic-kern
```
#### Stage 7, step 4: mark ####
The `mark` feature positions marks with respect to base glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-mark}
Mark positioning
:::
```{svg-color-toggle-button} arabic-mark
```
#### Stage 7, step 5: mkmk ####
The `mkmk` feature positions marks with respect to preceding marks,
providing proper positioning for sequences of marks that attach to the
same base glyph.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #arabic-mkmk}
Mark-to-mark positioning
:::
```{svg-color-toggle-button} arabic-mkmk
```
================================================
FILE: opentype-shaping-bengali.md
================================================
```{include} /_global.md
```
# Bengali shaping in OpenType #
This document details the shaping procedure needed to display text
runs in the Bengali script.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes and subclasses](#shaping-classes-and-subclasses)
- [Bengali character tables](#bengali-character-tables)
- [The `` shaping model](#the-bng2-shaping-model)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The `` shaping model](#the-beng-shaping-model)
- [Distinctions from ``](#distinctions-from-bng2)
- [Advice for handling fonts with `` features only](#advice-for-handling-fonts-with-beng-features-only)
- [Advice for handling text runs composed in `` format](#advice-for-handling-text-runs-composed-in-beng-format)
## General information ##
The Bengali or Bangla script belongs to the Indic family, and follows
the same general patterns as the other Indic scripts. More
specifically, it belongs to the North Indic subgroup, in which
sequences of adjacent consonants are often represented as conjuncts.
The Bengali script is used to write multiple languages, most commonly
Bengali, Assamese, and Manipuri. In addition, Sanskrit may be written
in Bengali, so Bengali script runs may include glyphs from the Vedic
Extensions block of Unicode.
There are two extant Bengali script tags defined in OpenType, ``
and ``. The older script tag, ``, was deprecated in 2005.
Therefore, new fonts should be engineered to work with the ``
shaping model. However, if a font is encountered that supports only
``, the shaping engine should deal with it gracefully.
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign. In the Bengali
language, dependent-vowel signs may also be referred to as _kar_ forms — e.g., "i-kar" or
"u-kar".
The term "matra" is also used to refer to the headline above most
Bengali letters. To avoid ambiguity, the term **headline** is
used in most Unicode and OpenType shaping documents.
**Halant** and **Virama** are both standard terms for the below-base "vowel-killer"
mark. Unicode documents use the term "virama" most frequently, while
OpenType documents use the term "halant" most frequently. In the Bengali
language, this sign is known as the _hasanta_.
**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark
indicating that the preceding vowel should be nasalized. In the Bengali
language, this mark is known as the _candrabindu_.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Bengali text depends on the shaping engine correctly
classifying each glyph in the run. As with most other scripts, the
classifications must distinguish between consonants, vowels
(independent and dependent), numerals, punctuation, and various types
of diacritical mark.
For most codepoints, the `General Category` property defined in the Unicode
standard is correct, but it is not sufficient to fully capture the
expected shaping behavior (such as glyph reordering). Therefore,
Bengali glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes and subclasses ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark. The Vedic Extensions also include a `CANTILLATION`
class for tone marks.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (which are categorized as `Mark [Mn]`).
Bengali uses one subclass of consonant, `CONSONANT_DEAD`. This
subclass is used only for the Bengali "Khanda Ta" (`U+09CE`). It indicates that
"Khanda Ta" should match tests for consonants, such as when [identifying
syllables](#stage-1-identifying-syllables-and-other-sequences), but that, unlike
standard consonants, it carries no inherent vowel. The lack of an
inherent vowel is important during the [initial
reordering](#stage-2-initial-reordering) stage.
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType features that
might affect how the respective glyphs are drawn, such as `tnum`,
which specifies the usage of tabular-width numerals, and `sups`, which
replaces the default glyphs with superscript variants.
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the syllable base to which it is attached. The actual position of
the glyphs is determined by the lookups found in the font's GPOS
table, however, the shaping rules for Indic scripts require that the
shaping engine be able to identify marks by their general
position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the base consonant or syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Bengali character tables ###
Separate character tables are provided for the Bengali and Vedic
Extensions blocks as well as for other miscellaneous characters that
are used in `` text runs:
- [Bengali character table](character-tables/character-tables-bengali.md#bengali-character-table)
- [Vedic Extensions character table](character-tables/character-tables-bengali.md#vedic-extensions-character-table)
- [Miscellaneous character table](character-tables/character-tables-bengali.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category, its shaping class, and its mark-placement subclass. The
codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0981` | Mark [Mn] | BINDU | TOP_POSITION | ঁ Candrabindu |
| | | | |
|`U+0995` | Letter | CONSONANT | _null_ | ক Ka |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the tables use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Bengali text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph".
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
In addition to general punctuation, runs of Bengali text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block.
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
As with other Indic scripts, the initial reordering stage and the
final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage five, however, do not have a mandatory order.
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The position of the base consonant in a syllable.
- The final position of "Reph".
- Whether "Reph" must be requested explicitly or if it is formed by
a specific, implicit sequence.
- Whether the below-base forms feature is applied only to consonants
before the syllable base, only to consonants after the base
consonant, or to both.
- The ordering positions for dependent vowels
(matras). Specifically, right-side, above-base, and below-base
matras follow different rules in different scripts.
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
With regard to these common variations, Bengali's specific shaping
characteristics include:
- `BASE_POS_LAST` = The base consonant of a syllable is the last
consonant, not counting any special final-consonant forms.
- `REPH_POS_AFTER_SUBJOINED` = "Reph" is ordered after all subjoined (i.e.,
below-base) consonant forms.
- `REPH_MODE_IMPLICIT` = "Reph" is formed by an initial "Ra,Halant" sequence.
- `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to
pre-base consonants and to post-base consonants.
- `MATRA_POS_TOP` = _null_ = Unlike most other Indic scripts, Bengali
does not use any above-base matras. Therefore, this shaping
characteristic does not apply.
- `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are
ordered after all post-base consonant forms.
- `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are
ordered after all subjoined (i.e., below-base) consonant forms.
These characteristics determine how the shaping engine must reorder
certain glyphs, how base consonants are determined, and how "Reph"
should be encoded within a run of text.
> Note: Unlike most other Indic scripts, Bengali does not use
> above-base matras. Therefore `MATRA_POS_TOP` can be set to _null_.
### Stage 1: Identifying syllables and other sequences ###
A syllable in Bengali consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
> Note: The Bengali Unicode block enumerates five modifier signs,
> "Candrabindu" (`U+0981`), "Anusvara" (`U+0982`), "Visarga"
> (`U+0983`), "Avagraha" (`U+09BD`), and "Vedic Anusvara"
> (`U+09FC`). In addition, Sanskrit text written in Bengali may
> include additional signs from Vedic Extensions block.
Each syllable contains exactly one vowel sound. Valid syllables may
begin with either a consonant or an independent vowel.
If the syllable begins with a consonant, then the consonant that
provides the vowel sound is referred to as the "base" consonant. If
the syllable begins with an independent vowel, that vowel is the
syllable's only vowel sound and serves as the "base".
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
From the shaping engine's perspective, the main distinction between a
syllable with a base consonant and a syllable with an
independent-vowel base is that a syllable with an independent-vowel
base is less likely to include additional consonants in special forms
and less likely to include dependent vowel signs
(matras). Therefore, in the common case, vowel-based syllables may
involve less reordering, substitution feature applications, and other
processing than consonant-based syllables.
In some languages and orthographies, vowel-based syllables are
not permitted to include additional consonants or matras, and certain
GSUB substitution features do not occur. However, there are often
known exceptions, and real-world text makes no such guarantees.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
Generally speaking, the base consonant is the final consonant of the
syllable and its vowel sound designates the end of the syllable. This
rule is synonymous with the `BASE_POS_LAST` characteristic mentioned
earlier.
Non-base consonants in a valid syllable will be separated by "Halant"
marks. Pre-base consonants will be followed by "Halant", while
post-base consonants will be preceded by "Halant".
Pre-baseC Halant BaseC Halant Post-baseC
The algorithm for correctly identifying the base consonant includes a
test to recognize these sequences and not mis-identify the base
consonant.
All consonants in Bengali can potentially occur in pre-base
position. The "Halant" marks on pre-base consonants indicate that they
carry no vowel. Instead, they affect syllable pronunciation by
combining with the base consonant (e.g., "_thr_" or "_spl_").
Three consonants in Bengali are allowed to occur after the base
consonant or syllable base: "Ya", "Ba", and "Ra". When these consonants occur after the
base consonant or syllable base, they take on special forms.
A "Ya" after the base consonant or syllable base takes on the "Yaphala" form.
> Note: some fonts may also implement the "Yaphala" form for a
> post-base "Yya" (`U+09DF`).
A "Ba" after the base consonant or syllable bases takes on the below-base "Baphala"
form. A "Ba" before the base consonant or syllable base will take on the below-base
"Baphala" form unless it is the first pre-base consonant in the syllable.
As with other Indic scripts, the consonant "Ra" receives special
treatment; in many circumstances it is replaced by one of two combining
mark-like forms.
- A "Ra,Halant" sequence at the beginning of a syllable is replaced
with an above-base mark called "Reph" (unless the "Ra" is the only
consonant in the syllable). This rule is synonymous with the
`REPH_MODE_IMPLICIT` characteristic mentioned earlier.
- A non-initial "Ra" before the base consonant or syllable base or a "Ra" after the
base consonant or syllable base takes on the below-base form "Raphala.""Reph" characters must be reordered after the
syllable-identification stage is complete.
> Note: `` text contains two Unicode codepoints for "Ra."
> `U+09B0` and `U+09F0`.
>
> `U+09B0` is used in Bengali-language, Manipuri-language, and
> Sanskrit text. `U+09F0` is used in Assamese-language text.
>
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in the Bengali script, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their categorization, using regular expressions.
The following general-purpose Indic-shaping regular expressions can be
used to match Bengali syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
> Note: Primarily, this means moving dependent-vowel (matra) glyphs,
> "Ra,Halant" glyph sequences, and other consonants that take special
> treatment in some circumstances. "Ba", "Ta", and "Ya" occasionally
> take on special forms, depending on their position in the syllable.
>
> These reordering moves are mandatory. The final-reordering stage
> may make additional moves, depending on the text and on the features
> implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. It includes some ordering categories not utilized in
Bengali.
The basic positions (left to right) are "Reph"
(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants
positioned before the base consonant or syllable base
(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant
or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base
consonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final
consonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic
signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
categories matter only in that they are unambiguous.
For a definition of the "base" consonant, refer to step 2.1, which
follows.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base, and it should be tagged
as `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a standalone sequence or other syllable that begins with a placeholder
or dotted circle, the placeholder or dotted circle will always serve
as the syllable base, and it should be tagged as
`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a syllable that begins with a consonant, the shaping engine must
determine the base consonant by a script-specific algorithm.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
The base consonant is defined as the consonant in a consonant-based
syllable that carries the syllable's vowel sound. That vowel sound
will either be provided by the script's inherent vowel (in which case
it is not written with a separate character) or the sound will be designated
by the addition of a dependent-vowel (matra) sign.
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reordering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
The algorithm for determining the base consonant is
- If the syllable starts with "Ra,Halant" and the syllable contains
more than one consonant, exclude the starting "Ra" from the list of
consonants to be considered.
- Starting from the end of the syllable, move backwards until a consonant is found.
* If the consonant is the first consonant, stop.
* If the consonant is preceded by the sequence "Halant,ZWJ", stop.
* If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`, then move to the previous consonant.
* If the consonant has a post-base form, tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If the consonant is a pre-base-reordering "Ra", tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If none of the above conditions is true, stop.
- The consonant stopped at will be the base consonant.
> Note: The algorithm is designed to work for all Indic
> scripts. However, Bengali does not utilize pre-base-reordering "Ra".
Bengali includes one post-base consonant.
- The sequence "Halant,Ya" (`U+09CD`,`U+09AF`) triggers
the "Yaphala" form. "Yaphala" behaves like a modifier to the
pronunciation of the preceding vowel, despite the fact that it is
formed from a consonant.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-yaphala}
Yaphala composition
:::
```{svg-color-toggle-button} bengali-yaphala
```
> Note: some fonts may also implement the "Yaphala" post-base form for
> "Halant,Yya" (`U+09CD`,`U+09DF`).
Bengali includes two below-base consonant forms:
- "Halant,Ra" (after the syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on the "Raphala" form.
- "Ba,Halant" (before the syllable base) and "Halant,Ba" (after the
syllable base) take on the "Baphala" form.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-raphala}
Raphala composition
:::
```{svg-color-toggle-button} bengali-raphala
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-baphala}
Baphala composition
:::
```{svg-color-toggle-button} bengali-baphala
```
> Note: Because Bengali employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search, only the "Halant,_consonant_"
> pattern following the syllable base for these below-base forms will
> be encountered. Step 2.5 below ensures that the "_consonant_,Halant"
> pattern preceding the base consonant or syllable base for these below-base forms will
> also be tagged correctly.
#### Stage 2, step 2: Matra decomposition ####
Second, any two-part dependent vowels (matras) must be decomposed
into their left-side and right-side components. Bengali has two
two-part dependent vowels, "O" (`U+09CB`) and "Au" (`U+09CC`). Each
has a canonical decomposition, so this step is unambiguous.
> "O" (`U+09CB`) decomposes to "`U+09C7`,`U+09BE`"
>
> "Au" (`U+09CC`) decomposes to "`U+09C7`,`U+09D7`"
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-matra-decompose}
Two-part matra decomposition
:::
```{svg-color-toggle-button} bengali-matra-decompose
```
#### Stage 2, step 3: Tag matras ####
Third, all left-side dependent-vowel (matra) signs, including those that
resulted from the preceding decomposition step, must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.
All right-side dependent-vowel (matra) signs are tagged
`POS_AFTER_POST`.
All below-base dependent-vowel (matra) signs are tagged
`POS_AFTER_SUBJOINED`.
For simplicity, shaping engines may choose to tag single-part matras
in an earlier text-processing step, using the information in the
_Mark-placement subclass_ column of the character tables. It is
critical at this step, however, that all decomposed matras are also
correctly tagged before proceeding to the next step.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign_".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
> Note: Bengali includes the consonant "Yya" (`U+09DF`), which is
> canonically equivalent to the sequence "Ya,Nukta"
> (`U+09AF`,`U+09BC`). "Ya" can also take on the post-base "Yaphala"
> form when it occurs in the sequence "`SYLLABLE_BASE`,Halant,Ya".
>
> Consequently, shaping engines that encounter a "Ya,Nukta"
> sequence may wish to recompose that sequence to "Yya" earlier than
> other nukta-variant substitutions, as a safeguard
> against the decomposed "Ya" unintentionally triggering a "Yaphala"
> substitution during GSUB feature application (if the sequence in
> question happens to match the "Yaphala" substitution rule as well as
> the "Yya" substitution rule).
>
> A well-behaved font should be expected to include explicit "Yya" and
> "Yaphala" substitution rules that do not trigger unexpected results,
> but there is no guarantee that real-world fonts will be well-behaved
> in this regard.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be
tagged. Excluding initial "Ra,Halant" sequences that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
Bengali includes two below-base consonant forms:
- "Halant,Ra" (after the syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on the "Raphala" form.
- "Ba,Halant" (before the syllable base) and "Halant,Ba" (after the
syllable base) take on the "Baphala" form.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-raphala-1}
Raphala composition
:::
```{svg-color-toggle-button} bengali-raphala-1
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-baphala-1}
Baphala composition
:::
```{svg-color-toggle-button} bengali-baphala-1
```
> Note: Because Bengali employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search in 2.1, any instances of the
> "Halant,_consonant_" pattern following the syllable base for these
> below-base forms will be encountered. The tagging in this step
> ensures that the "_consonant_,Halant" pattern preceding the syllable
> base for these below-base forms will also be tagged correctly.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" sequences that will become "Reph"s must be tagged with
`POS_RA_TO_BECOME_REPH`.
> Note: an initial "Ra,Halant" sequence will always become a "Reph"
> unless the "Ra" is the only consonant in the syllable.
#### Stage 2, step 7: Final consonants ####
Seventh, all final consonants must be tagged. Consonants that occur
after the syllable base _and_ after a dependent vowel (matra) sign
must be tagged with `POS_FINAL_CONSONANT`.
> Note: Final consonants occur only in Sinhala and should not be
> expected in `` text runs. This step is included here to
> maintain compatibility across Indic scripts.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `blwf` should match "Halant,Ra" and "Halant,Ba" in
post-base positions and "Ra,Halant" and
"Ba,Halant" in non-initial pre-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `pstf` should match "Halant,Ya" in post-base position
- `vatu` should match "_Consonant_,Halant,Ra"
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf (not used in Bengali)
pref (not used in Bengali)
blwf
abvf (not used in Bengali)
half
pstf
vatu
cjct
cfar (not used in Bengali)
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
#### Stage 3, step 2: nukt ####
The `nukt` feature replaces "_Consonant_,Nukta" sequences with a
precomposed nukta-variant of the consonant glyph.
The context defined for a `nukt` feature is:
:::{table} `nukt` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:------------------------------|:--------------|
| _none_ | `_consonant_`(full),`_nukta_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-nukt}
Nukta composition
:::
```{svg-color-toggle-button} bengali-nukt
```
> Note: Bengali includes the consonant "Yya" (`U+09DF`), which is
> canonically equivalent to the sequence "Ya,Nukta"
> (`U+09AF`,`U+09BC`). "Ya" can also take on the post-base "Yaphala"
> form when it occurs in the sequence "`SYLLABLE_BASE`,Halant,Ya".
>
> Consequently, shaping engines that encounter a "Ya,Nukta"
> sequence may wish to recompose that sequence to "Yya" earlier than
> other nukta-variant substitutions, as a safeguard
> against the decomposed "Ya" unintentionally triggering a "Yaphala"
> substitution during GSUB feature application (if the sequence in
> question happens to match the "Yaphala" substitution rule as well as
> the "Yya" substitution rule).
>
> A well-behaved font should be expected to include explicit "Yya" and
> "Yaphala" substitution rules that do not trigger unexpected results,
> but there is no guarantee that real-world fonts will be well-behaved
> in this regard.
#### Stage 3, step 3: akhn ####
The `akhn` feature replaces two specific sequences with required ligatures.
- "Ka,Halant,Ssa" is substituted with the "KSsa" ligature.
- "Ja,Halant,Nya" is substituted with the "JNya" ligature.
These sequences can occur anywhere in a syllable. The "KSsa" and
"JNya" characters have orthographic status equivalent to full
consonants in some languages, and fonts may have `cjct` substitution
rules designed to match them in subsequences. Therefore, this
feature must be applied before all other many-to-one substitutions.
The context defined for an `akhn` feature is:
:::{table} `akhn` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:----------------------------|:--------------|
| _none_ | `AKHAND_CONSONANT_SEQUENCE` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-akhn-kssa}
KSsa ligation
:::
```{svg-color-toggle-button} bengali-akhn-kssa
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-akhn-jnya}
JNya ligation
:::
```{svg-color-toggle-button} bengali-akhn-jnya
```
#### Stage 3, step 4: rphf ####
The `rphf` feature replaces initial "Ra,Halant" sequences with the
"Reph" glyph.
- An initial "Ra,Halant,ZWJ" sequence, however, must not be flagged for
the `rphf` substitution.
The context defined for a `rphf` feature is:
:::{table} `rphf` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| `SYLLABLE_START` | "Ra"(full),`_halant_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-rphf}
Reph composition with common "Ra"
:::
```{svg-color-toggle-button} bengali-rphf
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-rphf-as}
Reph composition with Assamese "Ra"
:::
```{svg-color-toggle-button} bengali-rphf-as
```
#### Stage 3, step 5: rkrf ####
> This feature is not used in Bengali.
#### Stage 3, step 6: pref ####
> This feature is not used in Bengali.
#### Stage 3, step 7: blwf ####
The `blwf` feature replaces below-base-consonant glyphs with any
special forms. Bengali includes two below-base consonant
forms:
- "Halant,Ra" (after the syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on the "Raphala" form.
- "Ba,Halant" (before the syllable base) and "Halant,Ba" (after the
syllable base) take on the "Baphala" form.
Because Bengali incorporates the `BLWF_MODE_PRE_AND_POST` shaping
characteristic, any pre-base consonants and any post-base consonants
may potentially match a `blwf` substitution; therefore, both cases must
be flagged for comparison. Note that this is not necessarily the case in other
Indic scripts that use a different `BLWF_MODE_` shaping
characteristic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-raphala-2}
Raphala composition
:::
```{svg-color-toggle-button} bengali-raphala-2
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-baphala-2}
Baphala composition
:::
```{svg-color-toggle-button} bengali-baphala-2
```
#### Stage 3, step 8: abvf ####
> This feature is not used in Bengali.
#### Stage 3, step 9: half ####
The `half` feature replaces "_Consonant_,Halant" sequences before the
base consonant or syllable base with "half forms" of the consonant
glyphs.
In the most common case, this substitution applies to
"_Consonant_,Halant" sequences that are followed by another
_Consonant_.
In addition, a sequence matching "_Consonant_,Halant,ZWJ" must also be
flagged for potential `half` substitutions.
> Note: The presence of the "ZWJ" at the end of the sequence means
> that the sequence may match the regular-expression test in stage 1
> as the end of a syllable, even without being followed by a base
> consonant or syllable base.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
There are three exceptions to the default behavior, for which
the shaping engine must test:
- Initial "Ra,Halant" sequences, which should have been flagged for
the `rphf` feature earlier, must not be flagged for potential
`half` substitutions.
- Non-initial "Ra,Halant" and "Ba,Halant" sequences, which should
have been flagged for the `rkrf` or `blwf` features earlier, must
not be flagged for potential `half` substitutions.
- A sequence matching "_Consonant_,Halant,ZWNJ,_Consonant_" must not be
flagged for potential `half` substitutions.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-half-ka}
Half-form formation
:::
```{svg-color-toggle-button} bengali-half-ka
```
#### Stage 3, step 10: pstf ####
The `pstf` feature replaces post-base-consonant glyphs with any special forms.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-yaphala-1}
Yaphala composition
:::
```{svg-color-toggle-button} bengali-yaphala-1
```
#### Stage 3, step 11: vatu ####
The `vatu` feature replaces certain sequences with "Vattu variant"
forms.
"Vattu variants" are formed from glyphs followed by "Raphala"
(the below-base form of "Ra"); therefore, this feature must be applied after
the `blwf` feature.
The context defined for a `vatu` feature is:
:::{table} `vatu` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| _none_ | `_consonant_`,"Raphala" | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-vatu}
Vattu variant ligation
:::
```{svg-color-toggle-button} bengali-vatu
```
#### Stage 3, step 12: cjct ####
The `cjct` feature replaces sequences of adjacent consonants with
conjunct ligatures. These sequences must match "_Consonant_,Halant,_Consonant_".
A sequence matching "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_" must not be flagged to form a conjunct.
> Note: The presence of the "ZWJ" in a
> "_Consonant_,Halant,ZWJ,_Consonant_" sequence should automatically
> inhibit any `cjct` feature rules from matching the sequence as valid
> input, and thus prevent the `cjct` substitution from being applied.
> Note: The presence of the "ZWNJ" in a
> "_Consonant_,Halant,ZWNJ,_Consonant_" sequence means that the
> "_Consonant_,Halant,ZWNJ" subsequence will match the
> regular-expression test in stage 1 as the end of a syllable.
>
> Because OpenType shaping features in `` are defined as
> applying only within an individual syllable, this means that the
> presence of the "ZWNJ" will automatically prevent the application of
> a `cjct` feature by triggering the identification of a syllable
> break between the two consonants.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWNJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
>
> Note, also: The presence of the "ZWJ" means that a
> "_Consonant_,Halant,ZWJ" sequence may match the regular-expression
> test in stage 1 as the end of a syllable, even without being
> followed by a base consonant or syllable base. By definition,
> however, a "_Consonant_,Halant,ZWJ" syllable identified in stage 1
> cannot also include a "_Consonant_" after the ZWJ.
The font's GSUB rules might be implemented so that `cjct`
substitutions apply to half-form consonants; therefore, this feature
must be applied after the `half` feature.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-cjct}
Conjunct ligation
:::
```{svg-color-toggle-button} bengali-cjct
```
#### Stage 3, step 13: cfar ####
> This feature is not used in Bengali.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-matra-position}
Pre-base matra reordering
:::
```{svg-color-toggle-button} bengali-matra-position
```
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unnecessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. Because Bengali incorporates the `REPH_POS_AFTER_SUBJOINED`
shaping characteristic, this final position is defined to be
immediately after the syllable base and any subjoined (below-base
consonant or below-base dependent vowel) forms.
The algorithm for finding the final "Reph" position is
- Starting at the first post-"Reph" consonant, search forward looking
for the first explicit "Halant", ending the search when the base
consonant is encountered. If such an explicit "Halant" is found,
move the "Reph" to the position immediately after this
"Halant".
* If a zero-width joiner (ZWJ) or a zero-width non-joiner (ZWNJ)
follows this "Halant", move the "Reph" to the position
immediately after the ZWJ or ZWNJ. This will be the final
"Reph" position.
* If no ZWJ or ZWNJ follows this "Halant", leave the "Reph" in
its position immediately after the "Halant". This will be the
final "Reph" position.
- If no such explicit "Halant" is found in the previous step, find
the first post-base consonant that has not formed a ligature with
the base consonant. If such a non-ligated post-base consonant is
found, move the "Reph" to the position immediately before the
non-ligated post-base consonant. This will be the final "Reph"
position.
- If no such non-ligated post-base consonant is found in the
previous step, move the "Reph" to the position immediately before
the first post-base matra, syllable modifier, or Vedic sign that
has a positioning tag after the script's "Reph" position in the
syllable sort order (as listed in [stage
2](#stage-2-initial-reordering)). This will be the final "Reph"
position.
> Note: Because Bengali incorporates the
> `REPH_POS_AFTER_SUBJOINED` shaping characteristic, this means
> any positioning tag of `POS_BEFORE_POST` or later,
> although a post-base matra, syllable modifier, or Vedic sign
> would not typically be tagged with `POS_BEFORE_POST`.
- If no other location has been located in the previous steps, move
the "Reph" to the end of the syllable.
Finally, if the final position of "Reph" occurs after a
"_matra_,Halant" subsequence, then "Reph" must be repositioned to the
left of "Halant", to allow for potential matching with `abvs` or
`psts` substitutions from GSUB.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-reph-position}
Reph final reordering
:::
```{svg-color-toggle-button} bengali-reph-position
```
#### Stage 4, step 4: Pre-base-reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
Bengali does not use pre-base-reordering consonants, so this step will
involve no work when processing `` text. It is included here in order
to maintain compatibility with the other Indic scripts.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
> Note: Although the specification defines the `init` feature as being
> used for word-initial positions only, the feature's origin bases
> this on the linguistic sense of "word" and that sense may not be
> precise enough to cover all of the cases encountered in a
> contemporary text run.
>
> In practice, users may expect the `init` feature to be applied when
> a sequence has a left-side dependent vowel that is preceded by a
> punctuation character, a currency symbol, an emoji, or any of
> several other categories of code point. Shaping engines may need to
> adapt their matching rules to meet users' expectations for this
> feature.
>
> The Microsoft Uniscribe shaping engine historically tested for a
> certain range of Unicode `General Category` and more recent shaping
> engines follow suit. For more information on Uniscribe
> compatibility, see the [Uniscribe-bug-compatibility
> note](/notes/uniscribe-bug-compatibility.md).
### Stage 5: Applying all remaining substitution features from GSUB ###
In this stage, the remaining substitution features from the GSUB table
are applied. In preparation for this stage, glyph sequences should be
flagged for possible application of GSUB features in stage 2,
step 10.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
init
pres
abvs
blws
psts
haln
The `init` feature replaces word-initial glyphs with special
presentation forms. Generally, these forms involve removing the
headline in-stroke from the left side of the glyph.
The context defined for an `init` feature is:
:::{table} `init` feature context
| Backtrack | Matching sequence | Lookahead |
|:-------------|:---------------------------|:--------------------|
| `WORD_START` | `_matra_`(`LEFT_POSITION`) | `_consonant_`(full) |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-init}
Application of the `init` feature
:::
```{svg-color-toggle-button} bengali-init
```
The `pres` feature replaces pre-base-consonant glyphs with special
presentations forms. This can include consonant conjuncts, half-form
consonants, and stylistic variants of left-side dependent vowels
(matras).
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-pres}
Application of the `pres` feature
:::
```{svg-color-toggle-button} bengali-pres
```
The `abvs` feature replaces above-base-consonant glyphs with special
presentation forms. This usually includes contextual variants of
above-base marks or contextually appropriate mark-and-base ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-abvs}
Application of the `abvs` feature
:::
```{svg-color-toggle-button} bengali-abvs
```
The `blws` feature replaces below-base-consonant glyphs with special
presentation forms. This usually includes replacing base consonants that
are adjacent to below-base-consonant forms like "Raphala" or
"Baphala" with contextual ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-blws}
Application of the `blws` feature
:::
```{svg-color-toggle-button} bengali-blws
```
The `psts` feature replaces post-base-consonant glyphs with special
presentation forms. This usually includes replacing right-side
dependent vowels (matras) with stylistic variants or replacing
post-base-consonant/matra pairs with contextual ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-psts}
Application of the `psts` feature
:::
```{svg-color-toggle-button} bengali-psts
```
The `haln` feature replaces syllable-final "_Consonant_,Halant" pairs with
special presentation forms. This can include stylistic variants of the
consonant where placing the "Halant" mark on its own is
typographically problematic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-haln}
Application of the `haln` feature
:::
```{svg-color-toggle-button} bengali-haln
```
> Note: The `calt` feature, which allows for generalized application
> of contextual alternate substitutions, is usually applied at this
> point. However, `calt` is not mandatory for correct Bengali shaping
> and may be disabled in the application by user preference.
### Stage 6: Applying remaining positioning features from GPOS ###
In this stage, mark positioning, kerning, and other GPOS features are
applied.
As with the preceding stage, the order in which these features are
applied is not canonical; they should be applied in the order in which
they appear in the GPOS table in the font.
dist
abvm
blwm
> Note: The `kern` feature is usually applied at this stage, if it is
> present in the font. However, `kern` (like `calt`, above) is not
> mandatory for shaping Bengali text and may be disabled by user preference.
The `dist` feature adjusts the horizontal positioning of
glyphs. Unlike `kern`, adjustments made with `dist` do not require the
application or the user to enable any software _kerning_ features, if
such features are optional.
The `abvm` feature positions above-base marks for attachment to base
characters. In Bengali, this includes "Reph" in addition to the
diacritical marks and Vedic signs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-abvm}
Application of the `abvm` feature
:::
```{svg-color-toggle-button} bengali-abvm
```
The `blwm` feature positions below-base marks for attachment to base
characters. In Bengali, this includes below-base dependent vowels
(matras) as well as the below-base consonant forms "Raphala" and
"Baphala".
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #bengali-blwm}
Application of the `blwm` feature
:::
```{svg-color-toggle-button} bengali-blwm
```
## The `` shaping model ##
The older Bengali script tag, ``, has been deprecated. However,
shaping engines may still encounter fonts that were built to work with
`` and some users may still have documents that were written to
take advantage of `` shaping.
### Distinctions from `` ###
The most significant distinction between the shaping models is that the
sequence of "Halant" and consonant glyphs used to trigger shaping
features was altered when migrating from `` to
``.
Specifically, shaping engines were expected to reorder post-base
"Halant,_Consonant_" sequences to "_Consonant_,Halant".
As a result, a font's GSUB substitutions would be written to match
"_Consonant_,Halant" sequences in all pre-base and post-base positions.
The `` syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In `` text, as described above in this document, there is no
such reordering. The correct sequence to match for GSUB substitutions is
"_Consonant_,Halant" for pre-base consonants, but "Halant,_Consonant_"
for post-base consonants.
The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, ``
was treated as if it followed the `BLWF_MODE_POST_ONLY`
characteristic. In other words, below-base form substitutions were
only applied to consonants after the base consonant or syllable base.
In addition, for some scripts, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For `` text, the left-side matra was always positioned
at the beginning of the syllable.
### Advice for handling fonts with `` features only ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences in order to apply GSUB substitutions when it is known that
the font in use supports only the `` shaping model.
### Advice for handling text runs composed in `` format ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences for GSUB substitutions or to reorder them to
"Halant,_Consonant_" when processing text runs that are tagged with
the `` script tag and it is known that the font in use supports
only the `` shaping model.
Shaping engines may also choose to apply `blwf` substitutions to
below-base consonants occurring before the base consonant or syllable base when it is
known that the font in use supports an applicable substitution lookup.
Shaping engines may also choose to position left-side matras according
to the `` ordering scheme; however, doing so might interfere
with matching GSUB or GPOS features.
================================================
FILE: opentype-shaping-default.md
================================================
# Default script shaping in OpenType #
This document details the default shaping procedure needed to display
text runs in non-complex scripts. It may also be used as a fallback
model for unrecognized scripts.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Normalization](#normalization)
- [The default shaping model](#the-default-shaping-model)
- [Stage 1: Applying the basic substitution features from GSUB](#stage-1-applying-the-basic-substitution-features-from-gsub)
- [Stage 2: Applying typographic substitution features from GSUB](#stage-2-applying-typographic-substitution-features-from-gsub)
- [Stage 3: Applying the positioning features from GPOS](#stage-3-applying-the-positioning-features-from-gpos)
## General information ##
The default OpenType shaping model is used for scripts that are
considered _non-complex_ from the shaper's perspective. This
designation means that shaping a text run does not involve glyph
reordering, contextual joining behavior, or the substitution of
context-dependent forms for linguistic or orthographic correctness.
Text runs in non-complex scripts may, however, involve ligature
substitution, Unicode normalization, mark positioning, kerning, and
the application of other features from the active font's GSUB and GPOS
tables.
The non-complex scripts covered by this model include Latin, Cyrillic,
Greek, Armenian, Georgian, Ethiopic, Cherokee, Tifinagh, and many others.
## Terminology ##
Many of these scripts support diacritics and other **marks**. Unicode may
contain **precomposed** mark-and-base codepoints for some or all
combinations of marks and base letters in the script. For combinations
without a codepoint, the desired form can be achieved by following the
**base** letter with a **combining mark** codepoint.
The primary concern for the shaping engine is processing the text run into
the correct normalized form, so that the best glyphs from the active
font can be selected from among the available precomposed and
combining alternatives.
Fonts for non-complex scripts might not include a GSUB or GPOS table
at all.
However, GSUB and GPOS may also be used to implement a variety of
OpenType smart features, including several classes of ligature,
contextual alternate, or contextual positioning rules. Because these
features are not required in order to render the text run
orthographically correct, the features are not considered shaping
features. Nevertheless, the shaping engine may be expected to apply
these features in order to simplify the overall text-rendering
architecture of the implementation.
## Normalization ##
Unicode defines algorithms for normalizing a sequence of input
codepoints into either a canonical composed form or a canonical
decomposed form. The purpose of these algorithms and of the defined
normalization forms is to determine equivalent representations of input
sequences regardless of variations in the input sequences.
For example, a base letter with an attached mark might exist in
Unicode as a single codepoint, but an input sequence might consist of
the base letter codepoint followed by the combining mark
codepoint. Unicode normalization can be used to determine that the
"letter, mark" sequence is equivalent to the single codepoint. This
simplifies sorting, searching, string comparison, and many other common
tasks.
OpenType shaping utilizes Unicode normalization, but OpenType
shaping has a distinctly different goal: to select the best or most
appropriate representation of the input codepoint sequence that is
available in the active font. A full description of the algorithm is
available in the [normalization](opentype-shaping-normalization.md) document.
Shaping some complex scripts involves explicit composition or
decomposition steps. The default shaping model does not involve any
such steps, but it does proceed with the general assumption that text
runs have been normalized as part of input sanitization.
For convenience, shaping engines may choose to implement a single
normalization routine for all scripts, default and complex. If
normalization is done before the shaping-model–specific processing is
done, then there may be no work required in certain shaping steps
(such as the processing of `ccmp` substitutions from GSUB). However,
these steps will always be described in the relevant script's shaping
document.
## The default shaping model ##
Processing a run of text in the default shaping model involves three
top-level stages:
1. Applying the basic substitution features from GSUB
2. Applying typographic substitution features from GSUB
3. Applying the positioning features from GPOS
Together, these stages cover the application of all GSUB and GPOS
features that are required or that have been defined by OpenType as
being on by default.
For convenience, shaping engines may also choose to apply any optional
or off-by-default OpenType features that have been activated for the
text run (including those that have been
enabled by the user and those that have been enabled at the
application level). However, the order in which such features should
be applied and how they should interact with OpenType shaping features
is beyond the scope of this document.
The default shaping model does not involve syllable-identification,
word-identification, or other preprocessing of the input
sequence. Shaping engines may choose how to segment longer text runs
for processing, or may choose to rely on higher-level applications to
make segmentation decisions.
### Stage 1: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be tagged for possible application
of GSUB features.
These substitutions include those features designed to provide
linguistic and orthographic correctness.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
locl
ccmp
rlig
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
The `ccmp` feature allows a font to substitute mark-and-base sequences
with a pre-composed glyph including the mark and the base, or to
substitute a single glyph into an equivalent decomposed sequence of
glyphs.
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
> Note: The `ccmp` feature may perform compositions or decompositions
> of glyph sequences that do not have a canonical decomposition
> defined in Unicode.
The `rlig` feature substitutes glyph sequences with mandatory
ligatures. Substitutions made by `rlig` cannot be disabled by
application-level user interfaces.
### Stage 2: Applying typographic substitution features from GSUB ###
The typographic-substitution phase applies all remaining substitution
features using the rules in the font's GSUB table. In preparation for
this stage, glyph sequences should be tagged for possible application
of GSUB features.
These substitutions include those features designed to provide
typographic consistency and correctness.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
rclt
calt
clig
liga
The `rclt` feature substitutes glyphs with contextual alternate
forms. In general, the `rclt` feature is used to perform such
substitutions that are required by the orthography of the active
script and language. Substitutions made by `rclt` cannot be disabled
by application-level user interfaces.
The `calt` feature substitutes glyphs with contextual alternate
forms. In general, the `calt` feature performs substitutions that are
not mandatory for orthographic correctness. However, unlike `rclt`,
the substitutions made by `calt` can be disabled by application-level
user interfaces.
The `clig` feature substitutes optional ligatures that are on by
default, but which are activated only in certain
contexts. Substitutions made by `clig` may be disabled by
application-level user interfaces.
The `liga` feature substitutes standard, optional ligatures that are on
by default. Substitutions made by `liga` may be disabled by
application-level user interfaces.
### Stage 3: Applying the positioning features from GPOS ###
The positioning stage adjusts the positions of mark and base
glyphs. In preparation for this stage, glyph sequences should be
tagged for possible application of GPOS features.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
curs
dist
kern
mark
mkmk
The `curs` feature perform cursive positioning. Each glyph has an
entry point and exit point; the `curs` feature positions glyphs so
that the entry point of the current glyph meets the exit point of the
preceding glyph.
The `dist` feature adjusts the horizontal positioning of
glyphs. Unlike `kern`, adjustments made with `dist` do not require the
application or the user to enable any software kerning features, if
such features are optional.
The `kern` adjusts glyph spacing between pairs of adjacent glyphs.
The `mark` feature positions marks with respect to base glyphs.
The `mkmk` feature positions marks with respect to preceding marks,
providing proper positioning for sequences of marks that attach to the
same base glyph.
================================================
FILE: opentype-shaping-devanagari.md
================================================
```{include} /_global.md
```
# Devanagari shaping in OpenType #
This document details the shaping procedure needed to display text
runs in the Devanagari script.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes and subclasses](#shaping-classes-and-subclasses)
- [Devanagari character tables](#devanagari-character-tables)
- [The `` shaping model](#the-dev2-shaping-model)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The `` shaping model](#the-deva-shaping-model)
- [Distinctions from ``](#distinctions-from-dev2)
- [Advice for handling fonts with `` features only](#advice-for-handling-fonts-with-deva-features-only)
- [Advice for handling text runs composed in `` format](#advice-for-handling-text-runs-composed-in-deva-format)
## General information ##
The Devanagari script belongs to the Indic family, and follows
the same general patterns as the other Indic scripts. More
specifically, it belongs to the North Indic subgroup, in which
sequences of adjacent consonants are often represented as conjuncts.
The Devanagari script is used to write multiple languages, most commonly
Hindi, Marathi, Maithili, and Nepali. In addition, Sanskrit may be written
in Devanagari, so Devanagari script runs may include glyphs from the Vedic
Extensions block of Unicode.
There are two extant Devanagari script tags defined in OpenType, ``
and ``. The older script tag, ``, was deprecated in 2005.
Therefore, new fonts should be engineered to work with the ``
shaping model. However, if a font is encountered that supports only
``, the shaping engine should deal with it gracefully.
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign.
The term "matra" is also used to refer to the headline above most
Devanagari letters. To avoid ambiguity, the term **headline** is
used in most Unicode and OpenType shaping documents.
**Halant** and **Virama** are both standard terms for the below-base "vowel-killer"
sign. Unicode documents use the term "virama" most frequently, while
OpenType documents use the term "halant" most frequently.
**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark
indicating that the preceding vowel should be nasalized.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Devanagari text depends on the shaping engine correctly
classifying each glyph in the run. As with most other scripts, the
classifications must distinguish between consonants, vowels
(independent and dependent), numerals, punctuation, and various types
of diacritical mark.
For most codepoints, the `General Category` property defined in the Unicode
standard is correct, but it is not sufficient to fully capture the
expected shaping behavior (such as glyph reordering). Therefore,
Devanagari glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes and subclasses ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark. The Vedic Extensions also include a `CANTILLATION`
class for tone marks.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (which are categorized as `Mark [Mn]`).
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType features that
might affect how the respective glyphs are drawn, such as `tnum`,
which specifies the usage of tabular-width numerals, and `sups`, which
replaces the default glyphs with superscript variants.
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the base character to which it is attached. The actual position of
the glyphs is determined by the lookups found in the font's GPOS
table, however, the shaping rules for Indic scripts require that the
shaping engine be able to identify marks by their general
position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Devanagari character tables ###
Separate character tables are provided for the Devanagari, Devanagari
Extended, and Vedic Extensions block as well as for other
miscellaneous characters that are used in `` text runs:
- [Devanagari character table](character-tables/character-tables-devanagari.md#devanagari-character-table)
- [Devanagari Extended character table](character-tables/character-tables-devanagari.md#devanagari-extended-character-table)
- [Vedic Extensions character table](character-tables/character-tables-devanagari.md#vedic-extensions-character-table)
- [Miscellaneous character table](character-tables/character-tables-devanagari.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category, its shaping class, and its mark-placement subclass. The
codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0901` | Mark [Mn] | BINDU | TOP_POSITION | ँ Candrabindu |
| | | | |
|`U+0915` | Letter | CONSONANT | _null_ | क Ka |
:::
Codepoints with no assigned meaning are designated as _unassigned_ in
the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the tables use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Devanagari text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph".
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
As with other Indic scripts, the initial reordering stage and the
final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage five, however, do not have a mandatory order.
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The position of the base consonant in a syllable.
- The final position of "Reph".
- Whether "Reph" must be requested explicitly or if it is formed by
a specific, implicit sequence.
- Whether the below-base forms feature is applied only to consonants
before the syllable base, only to consonants after the base
consonant, or to both.
- The ordering positions for dependent vowels
(matras). Specifically, right-side, above-base, and below-base
matras follow different rules in different scripts.
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
With regard to these common variations, Devanagari's specific shaping
characteristics include:
- `BASE_POS_LAST` = The base consonant of a syllable is the last
consonant, not counting any special final-consonant forms.
- `REPH_POS_BEFORE_POST` = "Reph" is ordered before all post-base consonant forms.
- `REPH_MODE_IMPLICIT` = "Reph" is formed by an initial "Ra,Halant" sequence.
- `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to
pre-base consonants and to post-base consonants.
- `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are
ordered after subjoined (i.e., below-base) consonant forms.
- `MATRA_POS_RIGHT` = `POS_AFTER_SUBJOINED` = Right-side matras are
ordered after subjoined (i.e., below-base) consonant forms.
- `MATRA_POS_BOTTOM` = `POS_AFTER_SUBJOINED` = Below-base matras are
ordered after all subjoined (i.e., below-base) consonant forms.
These characteristics determine how the shaping engine must reorder
certain glyphs, how base consonants are determined, and how "Reph"
should be encoded within a run of text.
### Stage 1: Identifying syllables and other sequences ###
A syllable in Devanagari consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
> Note: The Devanagari Unicode block enumerates nine modifier signs,
> "Inverted Candrabindu" (`U+0900`), "Candrabindu" (`U+0901`),
> "Anusvara" (`U+0902`), "Visarga" (`U+0903`), "Avagraha" (`U+093D`),
> "Udatta" (`U+0951`), "Anudatta" (`U+0952`), "Grave Accent"
> (`U+0953`) and "Acute Accent" (`U+0954`). In addition, Sanskrit text
> written in Devanagari may include additional signs from Vedic
> Extensions block.
Each syllable contains exactly one vowel sound. Valid syllables may
begin with either a consonant or an independent vowel.
If the syllable begins with a consonant, then the consonant that
provides the vowel sound is referred to as the "base" consonant. If
the syllable begins with an independent vowel, that independent vowel
is the syllable's only vowel sound and serves as the "base".
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
From the shaping engine's perspective, the main distinction between a
syllable with a base consonant and a syllable with an
independent-vowel base is that a syllable with an independent-vowel
base is less likely to include additional consonants in special forms
and less likely to include dependent vowel signs
(matras). Therefore, in the common case, vowel-based syllables may
involve less reordering, substitution feature applications, and other
processing than consonant-based syllables.
In some languages and orthographies, vowel-based syllables are
not permitted to include additional consonants or matras, and certain
GSUB substitution features do not occur. However, there are often
known exceptions, and real-world text makes no such guarantees.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
Generally speaking, the base consonant is the final consonant of the
syllable and its vowel sound designates the end of the syllable. This
rule is synonymous with the `BASE_POS_LAST` characteristic mentioned
earlier.
Valid consonant-based syllables may include one or more additional
consonants that precede the base consonant. Each of these
other, pre-base consonants will be followed by the "Halant" mark, which
indicates that they carry no vowel. They affect pronunciation by
combining with the base consonant (e.g., "_str_", "_pl_") but they
do not add a vowel sound.
As with other Indic scripts, the consonant "Ra" receives special
treatment; in many circumstances it is replaced by one of two combining
mark-like forms.
- A "Ra,Halant" sequence at the beginning of a syllable
is replaced with an above-base mark called "Reph" (unless the "Ra"
is the only consonant in the syllable).
This rule is synonymous with the `REPH_MODE_IMPLICIT`
characteristic mentioned earlier.
- "Halant,Ra" sequences that occur elsewhere in the syllable may take on the
below-base form "Rakaar" .
In addition, "Rra,Halant" sequences that precede the base consonant or syllable base
may take on a form known as the "eyelash Ra" .
> Note: In `` text runs, this substitution is canonically
> implemented as a [half form](#stage-3-step-9-half). See the [``
> shaping](#the-deva-shaping-model) section for a discussion of the
> "eyelash Ra" implementation that was used in the `` model.
"Reph" and "Rakaar" characters must be reordered after the
syllable-identification stage is complete.
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in the Devanagari script, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their categorization, using regular expressions.
The following general-purpose Indic-shaping regular expressions can be
used to match Devanagari syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
> Note: Primarily, this means moving dependent-vowel (matra) glyphs,
> "Ra,Halant" glyph sequences, and other consonants that take special
> treatment in some circumstances. "Ra" and "Rra" may
> take on special forms, depending on their position in the syllable.
>
> These reordering moves are mandatory. The final-reordering stage
> may make additional moves, depending on the text and on the features
> implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. It includes some ordering categories not utilized in
Devanagari.
The basic positions (left to right) are "Reph"
(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants
positioned before the base consonant or syllable base
(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant
or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base
consonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final
consonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic
signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
categories matter only in that they are unambiguous.
For a definition of the "base" consonant, refer to step 2.1, which
follows.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base, and it should be tagged
as `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a standalone sequence or other syllable that begins with a placeholder
or dotted circle, the placeholder or dotted circle will always serve
as the syllable base, and it should be tagged as
`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a syllable that begins with a consonant, the shaping engine must
determine the base consonant by a script-specific algorithm.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
The base consonant is defined as the consonant in a consonant-based
syllable that carries the syllable's vowel sound. That vowel sound
will either be provided by the script's inherent vowel (in which case
it is not written with a separate character) or the sound will be designated
by the addition of a dependent-vowel (matra) sign.
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reordering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
The algorithm for determining the base consonant is
- If the syllable starts with "Ra,Halant" and the syllable contains
more than one consonant, exclude the starting "Ra" from the list of
consonants to be considered.
- Starting from the end of the syllable, move backwards until a consonant is found.
* If the consonant is the first consonant, stop.
* If the consonant is preceded by the sequence "Halant,ZWJ", stop.
* If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`, then move to the previous consonant.
* If the consonant has a post-base form, tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If the consonant is a pre-base-reordering "Ra", tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If none of the above conditions is true, stop.
- The consonant stopped at will be the base consonant.
> Note: The algorithm is designed to work for all Indic
> scripts. However, Devanagari does not utilize pre-base-reordering "Ra".
Devanagari includes one below-base consonant form.
- "Halant,Ra" (occurring after the syllable base) and "Ra,Halant"
(before the syllable base, but in a non-syllable-initial
position) will take on the "Rakaar" form.
> Note: the sequence "Rra,Halant" (occurring before the base
> consonant) will take on the "eyelash Ra" special form. However, this
> special form is not a below-base form. Instead, it is canonically
> defined as belonging to the half-form substitutions, so it is
> addressed by the `half` feature in stage 3, step 9, and is not
> addressed in this step.
> Note: Because Devanagari employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search, only the "Halant,_consonant_"
> pattern following the syllable base for these below-base forms will
> be encountered. Step 2.5 below ensures that the "_consonant_,Halant"
> pattern preceding the syllable base for these below-base forms will
> also be tagged correctly.
#### Stage 2, step 2: Matra decomposition ####
Second, any two-part dependent vowels (matras) must be decomposed
into their left-side and right-side components.
Devanagari does not have any two-part dependent vowels; this step is
listed here because it is part of the general processing scheme for
shaping Indic scripts.
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
#### Stage 2, step 3: Tag matras ####
Third, all left-side dependent-vowel (matra) signs must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.
Above-base, right-side, and below-base dependent-vowel (matra) signs
must be tagged with `POS_AFTER_SUBJOINED`.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign_".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
Devanagari includes one below-base consonant form.
- "Halant,Ra" (occurring after the syllable base) and "Ra,Halant"
(before the syllable base, but in a non-syllable-initial
position) will take on the "Rakaar" form.
> Note: the sequence "Rra,Halant" (occurring before the base
> consonant) will take on the "eyelash Ra" special form. However, this
> special form is not a below-base form. Instead, it is canonically
> defined as belonging to the half-form substitutions, so it is
> addressed by the `half` feature in stage 3, step 9, and is not
> addressed in this step.
> Note: Because Devanagari employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search in 2.1, any instances of the
> "Halant,_consonant_" pattern following the syllable base for these
> below-base forms will be encountered. The tagging in this step
> ensures that the "_consonant_,Halant" pattern preceding the syllable
> base for these below-base forms will also be tagged correctly.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" sequences that will become "Reph"s must be tagged with
`POS_RA_TO_BECOME_REPH`.
> Note: an initial "Ra,Halant" sequence will always become a "Reph"
> unless the "Ra" is the only consonant in the syllable.
#### Stage 2, step 7: Final consonants ####
Seventh, all final consonants must be tagged. Consonants that occur
after the syllable base _and_ after a dependent vowel (matra) sign
must be tagged with `POS_FINAL_CONSONANT`.
> Note: Final consonants occur only in Sinhala and should not be
> expected in `` text runs. This step is included here to
> maintain compatibility across Indic scripts.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `rkrf` should match "_Consonant_,Halant,Ra" sequences
- `blwf` should match "Halant,Ra" in post-base positions and
"Ra,Halant" in non-initial pre-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `vatu` should match "_Consonant_,Halant,Ra" sequences
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf
pref (not used in Devanagari)
blwf
abvf (not used in Devanagari)
half
pstf
vatu
cjct
cfar (not used in Devanagari)
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
#### Stage 3, step 2: nukt ####
The `nukt` feature replaces "_Consonant_,Nukta" sequences with a
precomposed nukta-variant of the consonant glyph.
- The context defined for a `nukt` feature is:
:::{table} `nukt` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:------------------------------|:--------------|
| _none_ | `_consonant_`(full),`_nukta_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-nukt}
Nukta composition
:::
```{svg-color-toggle-button} devanagari-nukt
```
#### Stage 3, step 3: akhn ####
The `akhn` feature replaces two specific sequences with required ligatures.
- "Ka,Halant,Ssa" is substituted with the "KSsa" ligature.
- "Ja,Halant,Nya" is substituted with the "JNya" ligature.
These sequences can occur anywhere in a syllable. The "KSsa" and
"JNya" characters have orthographic status equivalent to full
consonants in some languages, and fonts may have `cjct` substitution
rules designed to match them in subsequences. Therefore, this
feature must be applied before all other many-to-one substitutions.
- The context defined for an `akhn` feature is:
:::{table} `akhn` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:----------------------------|:--------------|
| _none_ | `AKHAND_CONSONANT_SEQUENCE` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-akhn-kssa}
KSsa ligation
:::
```{svg-color-toggle-button} devanagari-akhn-kssa
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-akhn-jnya}
JNya ligation
:::
```{svg-color-toggle-button} devanagari-akhn-jnya
```
#### Stage 3, step 4: rphf ####
The `rphf` feature replaces initial "Ra,Halant" sequences with the
"Reph" glyph.
- An initial "Ra,Halant,ZWJ" sequence, however, must not be flagged for
the `rphf` substitution.
- The context defined for a `rphf` feature is:
:::{table} `rphf` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| `SYLLABLE_START` | "Ra"(full),`_halant_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-rphf}
Reph composition
:::
```{svg-color-toggle-button} devanagari-rphf
```
#### Stage 3, step 5: rkrf ####
The `rkrf` feature replaces "_Consonant_,Halant,Ra" sequences with the
"Rakaar"-ligature form of the consonant glyph.
- The context defined for a `rkrf` feature is:
:::{table} `rkrf` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------------|:----------------------|:--------------|
| `_consonant_`(full) | `_halant_`,"Ra"(full) | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-rkrf}
Rakaar composition
:::
```{svg-color-toggle-button} devanagari-rkrf
```
#### Stage 3, step 6: pref ####
> This feature is not used in Devanagari.
#### Stage 3, step 7: blwf ####
The `blwf` feature replaces below-base-consonant glyphs with any
special forms. Devanagari includes one below-base consonant
form:
- "Halant,Ra" (occurring after the syllable base) or "Ra,Halant"
(before the syllable base, but in a non-syllable-initial position) will
take on the "Rakaar" form.
If the active font contains ligatures for the consonant adjacent to
the "Halant" (i.e., "_Consonant_,Halant,Ra"), then that ligature is
normally applied with the `rkrf` feature in step 3.5. The `blwf`
feature allows the "Ra" to be substituted with a standalone "Rakaar"
mark, to work with all consonants that do not have a `rkrf` ligature
in the font.
Because Devanagari incorporates the `BLWF_MODE_PRE_AND_POST` shaping
characteristic, any pre-base consonants and any post-base consonants
may potentially match a `blwf` substitution; therefore, both cases must
be flagged for comparison. Note that this is not necessarily the case in other
Indic scripts that use a different `BLWF_MODE_` shaping
characteristic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-blwf}
Below-base form
:::
```{svg-color-toggle-button} devanagari-blwf
```
#### Stage 3, step 8: abvf ####
> This feature is not used in Devanagari.
#### Stage 3, step 9: half ####
The `half` feature replaces "_Consonant_,Halant" sequences before the
base consonant or syllable base with "half forms" of the consonant
glyphs.
In the most common case, this substitution applies to
"_Consonant_,Halant" sequences that are followed by another
"_Consonant_".
In addition, a sequence matching "_Consonant_,Halant,ZWJ" must also be
flagged for potential `half` substitutions.
> Note: The presence of the "ZWJ" at the end of the sequence means
> that the sequence may match the regular-expression test in stage 1
> as the end of a syllable, even without being followed by a base
> consonant or syllable base.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
There are three exceptions to the default behavior, for which
the shaping engine must test:
- Initial "Ra,Halant" sequences, which should have been flagged for
the `rphf` feature earlier, must not be flagged for potential
`half` substitutions.
- Non-initial "Ra,Halant" sequences, which should have been flagged
for the `rkrf` or `blwf` features earlier, must not be flagged for
potential `half` substitutions.
- A sequence matching "_Consonant_,Halant,ZWNJ,_Consonant_" must not be
flagged for potential `half` substitutions.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-half}
Half-form formation
:::
```{svg-color-toggle-button} devanagari-half
```
In addition, the sequence "Rra,Halant" (occurring before the base
consonant or syllable base) will take on the "eyelash Ra" form. Because this
substitution is defined as the canonical half form of "Rra" in ``, the
shaping engine does not need to implement any special handling to
support it.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-eyelash-ra}
Eyelash Ra formation
:::
```{svg-color-toggle-button} devanagari-eyelash-ra
```
#### Stage 3, step 10: pstf ####
> This feature is not used in Devanagari.
#### Stage 3, step 11: vatu ####
The `vatu` feature replaces certain sequences with "Vattu variant"
forms.
"Vattu variants" are formed from glyphs followed by "Rakaar"
(the below-base form of "Ra"); therefore, this feature must be applied after
the `blwf` feature.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-vatu}
Vattu ligation
:::
```{svg-color-toggle-button} devanagari-vatu
```
#### Stage 3, step 12: cjct ####
The `cjct` feature replaces sequences of adjacent consonants with
conjunct ligatures. These sequences must match "_Consonant_,Halant,_Consonant_".
A sequence matching "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_" must not be flagged to form a conjunct.
> Note: The presence of the "ZWJ" in a
> "_Consonant_,Halant,ZWJ,_Consonant_" sequence should automatically
> inhibit any `cjct` feature rules from matching the sequence as valid
> input, and thus prevent the `cjct` substitution from being applied.
> Note: The presence of the "ZWNJ" in a
> "_Consonant_,Halant,ZWNJ,_Consonant_" sequence means that the
> "_Consonant_,Halant,ZWNJ" subsequence will match the
> regular-expression test in stage 1 as the end of a syllable.
>
> Because OpenType shaping features in `` are defined as
> applying only within an individual syllable, this means that the
> presence of the "ZWNJ" will automatically prevent the application of
> a `cjct` feature by triggering the identification of a syllable
> break between the two consonants.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWNJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
>
> Note, also: The presence of the "ZWJ" means that a
> "_Consonant_,Halant,ZWJ" sequence may match the regular-expression
> test in stage 1 as the end of a syllable, even without being
> followed by a base consonant or syllable base. By definition,
> however, a "_Consonant_,Halant,ZWJ" syllable identified in stage 1
> cannot also include a "_Consonant_" after the ZWJ.
The font's GSUB rules might be implemented so that `cjct`
substitutions apply to half-form consonants; therefore, this feature
must be applied after the `half` feature.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-cjct}
Conjunct ligation
:::
```{svg-color-toggle-button} devanagari-cjct
```
#### Stage 3, step 13: cfar ####
> This feature is not used in Devanagari.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-matra-position}
Pre-base matra positioning
:::
```{svg-color-toggle-button} devanagari-matra-position
```
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unnecessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. Because Devanagari incorporates the `REPH_POS_BEFORE_POST`
shaping characteristic, this final position is defined to be
immediately before any independent post-base consonant forms (meaning
the first post-base consonant that has not formed a ligature with the
syllable base).
The algorithm for finding the final "Reph" position is
- Starting at the first post-"Reph" consonant, search forward looking
for the first explicit "Halant", ending the search when the base
consonant is encountered. If such an explicit "Halant" is found,
move the "Reph" to the position immediately after this
"Halant".
* If a zero-width joiner (ZWJ) or a zero-width non-joiner (ZWNJ)
follows this "Halant", move the "Reph" to the position
immediately after the ZWJ or ZWNJ. This will be the final
"Reph" position.
* If no ZWJ or ZWNJ follows this "Halant", leave the "Reph" in
its position immediately after the "Halant". This will be the
final "Reph" position.
- If no such explicit "Halant" is found in the previous step, find
the first post-base consonant that has not formed a ligature with
the base consonant. If such a non-ligated post-base consonant is
found, move the "Reph" to the position immediately before the
non-ligated post-base consonant. This will be the final "Reph"
position.
- If no such non-ligated post-base consonant is found in the
previous step, move the "Reph" to the position immediately before
the first post-base matra, syllable modifier, or Vedic sign that
has a positioning tag after the script's "Reph" position in the
syllable sort order (as listed in [stage
2](#stage-2-initial-reordering)). This will be the final "Reph"
position.
> Note: Because Devanagari incorporates the
> `REPH_POS_BEFORE_POST` shaping characteristic, this means
> any positioning tag of `POS_POSTBASE_CONSONANT` or later,
> although a post-base matra, syllable modifier, or Vedic sign
> would not typically be tagged with `POS_POSTBASE_CONSONANT`.
- If no other location has been located in the previous steps, move
the "Reph" to the end of the syllable.
Finally, if the final position of "Reph" occurs after a
"_matra_,Halant" subsequence, then "Reph" must be repositioned to the
left of "Halant", to allow for potential matching with `abvs` or
`psts` substitutions from GSUB.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-reph-position}
Reph positioning
:::
```{svg-color-toggle-button} devanagari-reph-position
```
#### Stage 4, step 4: Pre-base-reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
Devanagari does not use the `init` feature, so this step will
involve no work when processing `` text. It is included here in
order to maintain compatibility with the other Indic scripts.
### Stage 5: Applying all remaining substitution features from GSUB ###
In this stage, the remaining substitution features from the GSUB table
are applied. In preparation for this stage, glyph sequences should be
flagged for possible application of GSUB features in stage 2,
step 10.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
init (not used in Devanagari)
pres
abvs
blws
psts
haln
The `init` feature is not used in Devanagari.
The `pres` feature replaces pre-base-consonant glyphs with special
presentations forms. This can include consonant conjuncts, half-form
consonants, and stylistic variants of left-side dependent vowels
(matras).
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-pres}
Pre-base substitution
:::
```{svg-color-toggle-button} devanagari-pres
```
The `abvs` feature replaces above-base-consonant glyphs with special
presentation forms. This usually includes contextual variants of
above-base marks or contextually appropriate mark-and-base ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-abvs}
Above-base substitution
:::
```{svg-color-toggle-button} devanagari-abvs
```
The `blws` feature replaces below-base-consonant glyphs with special
presentation forms. This usually includes replacing base consonants or syllable bases that
are adjacent to the below-base-consonant form "Rakaar" with contextual
ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-blws}
Below-base substitution
:::
```{svg-color-toggle-button} devanagari-blws
```
The `psts` feature replaces post-base-consonant glyphs with special
presentation forms. This usually includes replacing right-side
dependent vowels (matras) with stylistic variants or replacing
post-base-consonant/matra pairs with contextual ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-psts}
Post-base substitution
:::
```{svg-color-toggle-button} devanagari-psts
```
The `haln` feature replaces syllable-final "_Consonant_,Halant" pairs with
special presentation forms. This can include stylistic variants of the
consonant where placing the "Halant" mark on its own is
typographically problematic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-haln}
Halant substitution
:::
```{svg-color-toggle-button} devanagari-haln
```
> Note: The `calt` feature, which allows for generalized application
> of contextual alternate substitutions, is usually applied at this
> point. However, `calt` is not mandatory for correct Devanagari shaping
> and may be disabled in the application by user preference.
### Stage 6: Applying remaining positioning features from GPOS ###
In this stage, mark positioning, kerning, and other GPOS features are
applied.
As with the preceding stage, the order in which these features are
applied is not canonical; they should be applied in the order in which
they appear in the GPOS table in the font.
dist
abvm
blwm
> Note: The `kern` feature is usually applied at this stage, if it is
> present in the font. However, `kern` (like `calt`, above) is not
> mandatory for shaping Devanagari text and may be disabled by user preference.
The `dist` feature adjusts the horizontal positioning of
glyphs. Unlike `kern`, adjustments made with `dist` do not require the
application or the user to enable any software _kerning_ features, if
such features are optional.
The `abvm` feature positions above-base marks for attachment to base
characters. In Devanagari, this includes "Reph" in addition to
above-base dependent vowels (matras), diacritical marks, and Vedic signs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-abvm}
Above-base mark positioning
:::
```{svg-color-toggle-button} devanagari-abvm
```
The `blwm` feature positions below-base marks for attachment to base
characters. In Devanagari, this includes below-base dependent vowels
(matras) and diacritical marks as well as the below-base consonant form "Rakaar".
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #devanagari-blwm}
Below-base mark positioning
:::
```{svg-color-toggle-button} devanagari-blwm
```
## The `` shaping model ##
The older Devanagari script tag, ``, has been deprecated. However,
shaping engines may still encounter fonts that were built to work with
`` and some users may still have documents that were written to
take advantage of `` shaping.
### Distinctions from `` ###
The most significant distinction between the shaping models is that the
sequence of "Halant" and consonant glyphs used to trigger shaping
features was altered when migrating from `` to
``.
Specifically, shaping engines were expected to reorder post-base
"Halant,_Consonant_" sequences to "_Consonant_,Halant".
As a result, a font's GSUB substitutions would be written to match
"_Consonant_,Halant" sequences in all pre-base and post-base positions.
The `` syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In `` text, as described above in this document, there is no
such reordering. The correct sequence to match for GSUB substitutions is
"_Consonant_,Halant" for pre-base consonants, but "Halant,_Consonant_"
for post-base consonants.
The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, ``
was treated as if it followed the `BLWF_MODE_POST_ONLY`
characteristic — with a single exception made for non-syllable-initial
"Ra,Halant".
In other words, a non-syllable-initial "Ra,Halant" sequence would
trigger a below-base form substitution, but all other below-base form
substitutions were applied only to consonants after the base
consonant or syllable base.
In addition, for some scripts, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For `` text, the left-side matra was always positioned
at the beginning of the syllable.
Finally, in `` text, the "eyelash Ra" form was encoded as the
sequence "Ra,Halant,ZWJ".
In ``, the required encoding for "eyelash Ra" is now
"Rra,Halant", and the substitution is implemented using the `half`
feature of GSUB.
### Advice for handling fonts with `` features only ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences in order to apply GSUB substitutions when it is known that
the font in use supports only the `` shaping model.
### Advice for handling text runs composed in `` format ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences for GSUB substitutions or to reorder them to
"Halant,_Consonant_" when processing text runs that are tagged with
the `` script tag and it is known that the font in use supports
only the `` shaping model.
Shaping engines may also choose to apply `blwf` substitutions to
below-base consonants occurring before the base consonant or syllable base when it is
known that the font in use supports an applicable substitution lookup.
Shaping engines may also choose to position left-side matras according
to the `` ordering scheme; however, doing so might interfere
with matching GSUB or GPOS features.
================================================
FILE: opentype-shaping-emoji.md
================================================
# Emoji shaping in OpenType #
This document details the default shaping procedure needed to shape
emoji sequences.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Normalization](#normalization)
- [Bidirectionality](#bidirectionality)
- [Sequence identification](#sequence-identification)
- [Presentation sequences](#presentation-sequences)
- [Modifier sequences](#modifier-sequences)
- [Regional Indicator flag sequences](#regional-indicator-flag-sequences)
- [Tag flag sequences](#tag-flag-sequences)
- [Keycap sequences](#keycap-sequences)
- [Zero-Width Joiner sequences](#zwj-sequences)
- [ZWJ hair sequences](#zwj-hair-sequences)
- [ZWJ gendered person sequences](#zwj-gendered-person-sequences)
- [ZWJ multi-person group sequences](#zwj-multi-person-group-sequences)
- [ZWJ role sequences](#zwj-role-sequences)
- [ZWJ color sequences](#zwj-color-sequences)
- [ZWJ directionality sequences](#zwj-directionality-sequences)
- [ZWJ additional sequences](#zwj-additional-sequences)
- [Other sequences and ligatures](#other-sequences-and-ligatures)
- [Feature interaction in sequences](#feature-interaction-in-sequences)
- [Emoji sets](#emoji-sets)
- [The default shaping model](#the-default-shaping-model)
## General information ##
The emoji OpenType shaping model is used for correctly displaying
sequences from the Emoji block in Unicode as well as for numerous
emoji codepoints found in other blocks.
Emoji codepoints originated from a variety of pre-Unicode standards,
including mobile-phone carriers in Japan, from typographic characters
sets such as Zapf Dingbats and Wingdings, and from various symbols in
common usage.
Emoji shaping follows the default OpenType shaping model used for
scripts that are considered _non-complex_ from the shaper's
perspective. However, emoji fonts typically use GSUB tables to
implement a variety of OpenType smart features, including several
classes of ligature, contextual alternates, or variant forms to
support emoji sequences.
In addition to standalone image glyphs, emoji shaping is also used to
display flag sequences and "keycap" sequences, both of which involve a
combination of emoji and non-emoji codepoints in order.
The default emoji glyph for a given codepoint may be substituted by
the addition of selectors, modifiers, or joiners after the emoji
codepoint.
Many of these emoji sequences carry important semantic meaning,
such as specifying gender, skin tone, object colors, and
directions. Shaping engines should therefore make a best effort to
correctly identify and display these sequences.
Fallback presentation is possible for some emoji sequences by
displaying the sequence of default emoji glyphs for the
codepoints. For other emoji sequences, however, the most appropriate
fallback approach is less clearly defined and may vary between
implementations.
Emoji glyphs may be stored in any of several color formats, or in any
of the monochrome Bézier formats typically used for standard text
codepoints. Correctly retrieving and displaying the glyph data for
the format used by the active font is outside the scope of this
document.
> Note: "shortcut codes" for emoji like `:smile:` are text mark-up
> and are _not_ handled by OpenType shaping. The set of shortcut codes
> supported by any particular application is specific to that
> application alone.
>
> Text-processing stacks typically support a set of shortcut codes
> that includes Unicode's official `Short_Name` property from the CLDR
> database, plus additional short codes, but the shortcut-code mapping
> is not otherwise linked to Unicode data.
Runs of emoji might be tagged with the `` or `` script
subtags, or with the `-em-emoji`, `-em-text`, or `-em-default` locale
extensions. However, these subtags and extensions are primarily
intended to control which presentation form is preferred by the
application, and must not be relied on for the purpose of identifying
emoji.
## Terminology ##
A codepoint is considered an **emoji** only if it has the `Emoji`
property in the Unicode Character Database (UCD). Although many
codepoints that have this property are pictographic in nature, some
codepoints that are pictographic do not have the `Emoji` property
(such as most chess, playing-card, and game-piece symbols), and some
codepoints that do have the `Emoji` property show typographic
characters rather than pictographic images.
All emoji codepoints — as well as several non-emoji codepoints — have
the `Extended_Pictographic` property. When a non-emoji codepoint has
the `Extended_Pictographic` property, this indicates that future
revisions of Unicode may incorporate the codepoint in a valid emoji
sequence, or may (for a currently-unassigned codepoint) assign an
emoji character to the codepoint.
The emoji codepoints also include two distinct sets of alphanumeric
character codepoints that are used to implement specific substitution
sequences.
The **regional indicator** set includes the 26 lower-case
Basic Latin letters ("a" to "z"), which are used to support the
predefined set of regional flags. The regional indicator set is found
within the Enclosed Alphanumeric Supplement block of Unicode.
The **tag character** set includes codepoints that correspond to the
printable characters in the ASCII set, as well as an "End" control
tag. The tag characters are used to support a more general mechanism
for local and sub-national flags that are not covered by the
predefined regional-indicator flag set. The tag characters set is
found within the Tags block of Unicode.
**Presentation style** describes whether an emoji codepoint is
shown in emoji style (for example, with a full-color bitmap or SVG
glyph) or text style (such as a monochrome, Bézier glyph). Every emoji
codepoint defaults to either emoji-style or text-style
presentation.
An emoji codepoint might be followed by a **presentation selector**.
This selector requests that either emoji-style or text-style be used
for the preceding emoji codepoint, potentially overriding that
codepoint's default. There are two presentation selectors:
- `Variation Selector 15` (VS15, `U+FE0E`) requests text
presentation style.
- `Variation Selector 16` (VS16, `U+FE0F`) requests emoji
presentation style.
:::{figure-md}

Text presentation-style selector
:::
:::{figure-md}
{title="Testing"}
Emoji presentation-style selector
:::
An emoji codepoint might also be followed by an emoji
**modifier**. This modifier requests an alternate version of the emoji
glyph. Currently, there are five emoji modifiers defined, all of which
are assigned to a skin-tone designation from the Fitzpatrick scale:
- `U+1F3FB` "Light skin tone"
- `U+1F3FC` "Medium-light skin tone"
- `U+1F3FD` "Medium skin tone"
- `U+1F3FE` "Medium-dark skin tone"
- `U+1F3FF` "Dark skin tone"
Emoji **sequences** consist of one or more emoji codepoints,
optionally followed by presentation selectors, modifiers, or other
special characters. A font can implement custom ligatures for any
sequence of emoji. However, Unicode also designates specific sequences
that should be supported. These sequences can involve three special
non-printing codepoints in addition to the selectors and modifiers
mentioned above:
- The Combining Enclosing Keycap (CEK, `U+20E3`) is used to form
**keycap** sequences corresponding to telephone keypad keys.
- The Cancel Tag (`U+E007F`) is used to form tag-based flag
sequences.
- The Zero-Width Joiner (ZWJ, `U+200D`) is used to form emoji
sequences for multi-person groups, gendered forms, hair-color
variants, and directionality.
## Normalization ##
Emoji sequences are not generally affected by Unicode or OpenType
normalization. However, Unicode does specify an order to be used when
representing ZWJ-using emoji sequences.
The correct order should be:
Base emoji codepoint
Emoji modifier OR Emoji presentation selector
Hair subsequence
Color subsequence
Gender-sign or object subsequence
Directionality indicator
Although this ordering is not designated a Unicode normalization form,
shaping engine implementers may find it a useful target if attempting
to correct invalid mis-ordered emoji ZWJ sequences.
Shaping engines should also note that the `Emoji` and
`Extended_Pictographic` properties may require tracking in any Unicode
normalization routines.
The `Emoji` property of a codepoint can be unintentionally lost when
certain string transformations are performed. For example, the
upper-case versions of the Circled Latin Letters have the `Emoji`
property, but the lower-case version of the Circled Latin do
not. Therefore, a case-transformation rule must take care not to
unintentionally break the desired output by losing the property.
The `Extended_Pictographic` property of a codepoint should be tracked
because it is set on several non-emoji codepoints that may be updated
to have the `Emoji` property in a future release of Unicode.
## Bidirectionality ##
Most emoji sequences are defined to be neutral for the purpose of
bidirectionality segmenting and handling.
However, the Regional Indicator flag sequences are defined to be
left-to-right only, overriding any levels of bidirectional embedding.
## Sequence identification ##
There are six varieties of emoji sequence defined by Unicode:
1. Presentation sequences
2. Modifier sequences
3. Regional Indicator flag sequences
4. Tag flag sequences
5. Keycap sequences
6. Zero-width joiner (ZWJ) sequences
> Note: The ZWJ sequence variety incorporates several subsets, but all
> of the ZWJ sequences are implemented using the same mechanism.
The set of sequences includes various mechanisms defined at different
times by either Unicode itself or by legacy encoding standards. In
some cases, an older mechanism (such as the Regional Indicator
mechanism used for national flags) has been superseded by a newer,
more flexible mechanism intended to permit emoji vendors to provide
support for a large set of new representations or emoji variants
without requiring Unicode to define new codepoints for every possible
permutation. Nevertheless, shaping-engine implementers should expect
to encounter any or all of the defined sequences.
This set includes the major categories of sequences that shaping
engines are likely to encounter and that can convey important
contextual information to users. Note, however, that fonts may
implement additional sequences via ligature substitution or other
existing mechanisms.
Each of the six sequence varieties can also be interpreted as a
different module of overall "emoji sequence support" for a
shaping-engine implementation. For example, support for Regional
Indicator flag sequences is distinct from support for Keycap
sequences. For convenience, in this document, the sequence varieties
are listed in an order that roughly approximates their complexity, but
this ordering is not definitive.
Sequences should be identified by examining the run and matching
characters, based on their categorization, using regular expressions.
The following general-purpose identification classes can be used to
match emoji sequences in regular expressions.
```markdown
_emoji_ = `EMOJI`
_modifier_ = "U+1F3FB" | "U+1F3FC" | "U+1F3FD" | "U+1F3FE" | "U+1F3FF"
_presentation_ = `VS15` | `VS16`
_zwj_ = `ZWJ`
_cek_ = `CEK`
_blackflag_ = "U+1F3F4"
_key_ = "#" | "*" | ["0".."9"]
_color_ = "U+2B1B" | "U+2B1C" | "U+1F7E5" | "U+1F7E6" | "U+1F7E7" | "U+1F7E8" | "U+1F7E9" | "U+1F7EA" | "U+1F7EB"
_multipersongroup_ = "U+1F91D" | "U+1F46F" | "U+1F93C" | "U+1F46B" | "U+1F46C" | "U+1F46D" | "U+1F48F" | "U+1F491" | "U+1F46A"
_gendersign_ = "U+2640" | "U+2642"
_genderperson_ = "U+1F468" | "U+1F469" | "U+1F9D1"
_hairstyle_ = "U+1F9B0" | "U+1F9B1" | "U+1F9B2" | "U+1F9B3"
_direction_ = "U+2B05" | "U+27A1"
_regionalindicator_ = ["U+1F1E6".."U+1F1FF"]
_tagchar_ = `TAG_CHARACTER`
_endtag_ = "U+E007F"
```
The expressions below use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
### Presentation sequences ###
A presentation sequence is used to request a specific presentation
style ("text" or "emoji"), potentially overriding the default
presentation style defined for the codepoint by Unicode.
:::{figure-md}

Requesting emoji presentation style
:::
:::{figure-md}

Requesting text presentation style
:::
The active emoji font, however, might not contain a glyph for the
presentation style requested in the sequence. In particular, it is not
common for emoji fonts to include text-presentation glyphs for
codepoints that default to the emoji-presentation style.
In these instances, the text rendering stack should select a fallback
font that does contain the glyph requested by the presentation
sequence. Strategies for identifying appropriate fallback fonts are
beyond the scope of this document.
A standalone presentation sequence must match:
```markdown
_emoji_ _presentation_
```
Although standalone presentation sequences can occur, note that
presentation sequences also occur within longer emoji sequences.
### Modifier sequences ###
A modifier sequence is used to request an alternate glyph for an emoji
codepoint.
Currently, there are five emoji-modifier codepoints defined by
Unicode. Each corresponds to a different human skin-tone based on the
Fitzpatrick scale.
:::{figure-md}

Modifier for Fitzpatrick scale skin-tone 2
:::
:::{figure-md}

Modifier for Fitzpatrick scale skin-tone 3
:::
:::{figure-md}

Modifier for Fitzpatrick scale skin-tone 4
:::
:::{figure-md}

Modifier for Fitzpatrick scale skin-tone 5
:::
:::{figure-md}

Modifier for Fitzpatrick scale skin-tone 6
:::
A modifier sequence must match:
```markdown
_emojimodified_ = _emoji_ _modifier_
```
Fonts are expected to implement modifier sequences for emoji
codepoints that depict a single human being, and are expected not to
implement modifier sequences for other emoji codepoints.
> Note: Most emoji sequences that depict multiple human beings are
> modified using the ZWJ mechanisms described later, and not via this
> mechanism.
>
> However, there are a small number of codepoints that depict groups
> of human beings in a standalone codepoint and can be modified with a
> single modifier. They are summarized in the table at the [feature
> interaction in sequences](#feature-interaction-in-sequences)
> section.
>
> Note, also that there are emoji codepoints depicting beings that are
> ambiguous in regard to their humanity, such as `U+1F9DB`,
> "Vampire". Shaping engines should not assume that these codepoints
> are unable to support a modifier.
:::{figure-md}

Modifier sequence
:::
The fallback for a modifier sequence is the generic, unmodified
emoji followed by an emoji representing the skin-tone requested.
:::{figure-md}

Modifier sequence fallback
:::
Modifier sequences use emoji presentation style by default, and cannot
include a presentation selector. However, an implementation may choose
to display text-presentation versions of sequences if emoji
presentation style is not possible in the environment.
Although standalone modifier sequences occur, note that modifier
sequences can also occur within longer emoji sequences.
### Regional Indicator flag sequences ###
A Regional Indicator flag sequence is used to request a flag
emoji. All Regional Indicator flag sequences are two codepoints long,
using codepoints from the `REGIONAL_INDICATOR` alphabetical set.
A Regional Indicator flag sequence must match:
```markdown
_regionalindicator_ _regionalindicator_
```
In addition, the only two-codepoint sequences that are considered
valid Regional Indicator flag sequences are those that correspond to
the `unicode_region_subtag` field in the CLDR database.
The typical emoji implementation of such a sequence in an image of a
flag for the region. However, emoji fonts may choose to represent the
region through some other visual means (for example, a regional symbol
or map image). Similarly, where there is more than one possible flag
for a region, Unicode does not specify any particular visual
representation.
Some historical region subtags have been designated as deprecated (for
example, "East Germany" and "West Germany"). Emoji fonts are not
expected to support these deprecated subtags. However, if they
encountered in a text run and are supported in the active font,
shaping engines should deal with the situation gracefully, without
offering guarantees of support.
:::{figure-md}

Regional Indicator flag sequence
:::
Regional Indicator flag sequences use emoji presentation by default,
and cannot include a presentation selector. However, an
implementation may choose to display text-presentation versions of
sequences if emoji presentation style is not possible in the
environment.
:::{figure-md}

Regional Indicator flag sequence fallback
:::
Regional Indicator flag sequences only occur in standalone form.
> Note: The Regional Indicator flag sequences are defined to always be
> interpreted left-to-right (LTR) for the purpose of
> bidirectionality. This behavior differs from that of other emoji
> sequences, which are neutral in regard to bidirectionality.
>
> For example, a Regional Indicator sequence "RI_U, RI_A" should result
> in a flag for Ukraine ("UA"), even if it occurs within a run of
> right-to-left text. Reversing the sequence to result in a flag for
> Australia ("AU") is incorrect.
### Tag flag sequences ###
A Tag flag sequence is used to request a flag emoji for any flag not
defined by the Regional Indicator flag sequence mechanism.
A Tag flag sequence must match:
```markdown
_blackflag_ _tagchar_+ _endtag_
```
The codepoints in the `TAG_CHARACTER` set come from the "Tags" block
in Unicode. At present, the set of allowable tags is defined as the
range `[U+E0020..U+E007E]`, which includes tags for space, upper- and
lower-case basic Latin alphabetic letters, numerals, and several
symbols. However, Unicode also notes that the upper-case alphabetic
tags are not currently used.
Tag sequences must end with "Cancel Tag" (`U+E007F`).
> Note: Despite the official name "Cancel Tag", this codepoint
> terminates valid tag sequences, rather than negating them.
:::{figure-md}

Tag flag sequence
:::
Tag flag sequences only occur in standalone form.
### Keycap sequences ###
A Keycap sequence is used to request an emoji that depicts a
telephone-keypad button.
A Keycap sequence must match:
```markdown
_key_ _presentation_ _cek_
```
:::{figure-md}

Keycap sequence
:::
Keycap sequences only occur in standalone form.
### ZWJ sequences ###
A Zero-Width Joiner (ZWJ) sequence can be used to request specific
variants of an emoji glyph or to request the combined form of a
sequence of emoji glyphs.
Because the ZWJ codepoint itself is invisible, users will expect ZWJ
sequences to fall back gracefully as sequences of standalone emoji
glyphs that convey the original meaning. For example, a ZWJ
multi-person group sequence would be rendered as a single multi-person
emoji glyph if one is available in the active font, but would fall
back to a set of individual-person emoji glyphs.
#### ZWJ hair sequences ####
A ZWJ hair sequence is used to request a specific hairstyle version of
an emoji codepoint that depicts a single human being.
:::{figure-md}

ZWJ hairstyle sequence
:::
A ZWJ hair sequence must match:
```markdown
(_emoji_ | _emojimodified_) _zwj_ _hairstyle_
```
Currently, four hairstyle modifier codepoints are defined:
- `U+1F9B0` "Red or ginger hair"
- `U+1F9B1` "Curly hair"
- `U+1F9B2` "Bald"
- `U+1F9B3` "White hair"
The set of hairstyle sequences allowed has been chosen to enable
depictions of distinct properties not easily represented by the
defaults of the fallback glyphs. By default, the hairstyle and color
on fallback emoji is expected to be nondescript and dark.
Prior to the adoption of the ZWJ-hair-sequence mechanism, a codepoint
specifying "person with blond hair" (`U+1F471`) already existed;
therefore "blond" was not included in the set of supported hairstyle
versions.
#### ZWJ gendered person sequences ####
A ZWJ gendered person sequence is used to request a specific-gendered
version of an emoji codepoint that depicts a single human being.
Each ZWJ gendered person sequence is composed of an emoji that depicts
a human by default, followed by "ZWJ", followed by a gender symbol,
followed by "VS16".
:::{figure-md}

ZWJ gendered person sequence
:::
The fallback for a ZWJ gendered person sequence is a generic "person"
emoji followed by a gender symbol.
:::{figure-md}

ZWJ gendered person sequence fallback
:::
A ZWJ gendered person sequence must match:
```markdown
(_emoji_ | _emojimodified_) _zwj_ _gendersign_ _VS16_
```
A small number of emoji codepoints are defined to show a single human
being with a fixed gender. These codepoints cannot have their apparent
gender modified using the ZWJ gendered person mechanism.
Currently, this list of codepoints includes those in the table below:
:::{table} Single-human emoji codepoints that do not support gendered-person modifiers
| Emoji codepoint | Gender |
|:-------------------------------|:-------|
| `U+1F467` Girl | Female |
| `U+1F467` Boy | Male |
| `U+1F467` Woman | Female |
| `U+1F467` Man | Male |
| `U+1F467` Older woman | Female |
| `U+1F467` Older man | Male |
| `U+1F467` Mrs Claus | Female |
| `U+1F467` Santa Claus | Male |
| `U+1F467` Princess | Female |
| `U+1F467` Prince | Male |
| `U+1F467` Woman dancing | Female |
| `U+1F467` Man dancing | Male |
| `U+1F467` Pregnant woman | Female |
| `U+1F467` Breastfeeding | Female |
| `U+1F467` Woman with headscarf | Female |
:::
However, the list may be updated in subsequent revisions of Unicode.
In addition, emoji codepoints that depict groups of two or more human
beings are handled by other mechanisms, such as the ZWJ multi-person
group mechanism, and are documented in the corresponding section.
> Note: The ZWJ gendered person sequence is not to be confused with
> the ZWJ role sequence.
>
> In effect, both sequence types can be used to depict a human being
> performing a task or activity, and can be used to request a specific
> gender for the human being depicted.
>
> However, all of the codepoints covered by the ZWJ gendered person
> sequences are emoji that show a human being by default, whereas the
> codepoints covered by the ZWJ role sequences begin with a generic
> human-being emoji and append a symbol or object emoji.
#### ZWJ multi-person group sequences ####
A ZWJ multi-person group sequence is used to request a multi-person
emoji glyph. The fallback for a ZWJ multi-person group sequence is a
sequence of individual-person emoji glyphs.
A ZWJ multi-person group sequence must match:
```markdown
(_emoji_ | _emojimodified_) (_zwj_ (_emoji_ | _emojimodified_) _presentation_? ){1,3}
```
Only a fixed number of such multi-person group sequences is
defined. Some of the sequences make use of specific codepoints (such
as "Heavy Black Heart" or "Kiss Mark").
The currently supported configurations for multi-person group emoji
sequences are:
- Couple with heart
- Couple in kiss
- Couple holding hands
- Family
- Shaking hands
A potential source of confusion for these sequences is that some of
them appear to duplicate the content of an existing emoji codepoint,
but the existing emoji codepoint is typically not involved in forming
the corresponding ZWJ multi-person group sequence.
Specifically, there are standalone emoji codepoints for "Kiss"
(`U+1F48F`), two people holding hands (in three permutations:
`U+1F46B`, `U+1F46C`, and `U+1F46D`), "Family" (`U+1F46A`), and
"Handshake" (`U+1F91D`). The details of each of these codepoints in
relation to the corresponding conceptually-similar ZWJ multi-person
group sequence are noted below.
Each of the specific ZWJ multi-person group sequences has a precise
definition.
The "Couple with heart" sequence is composed of
"Person,ZWJ,Heavy_Black_Heart,VS16,ZWJ,Person", and must match:
```markdown
(_emoji_ | _emojimodified_) _zwj_ `U+2764` _VS16_ _zwj_ (_emoji_ | _emojimodified_)
```
:::{figure-md}

ZWJ multi-person heart sequence
:::
:::{figure-md}

ZWJ multi-person heart sequence with modifier
:::
The "Couple in kiss" sequence is composed of
"Person,ZWJ,Heavy_Black_Heart,VS16,ZWJ,Kiss_Mark,ZWJ,Person", and must match:
```markdown
(_emoji_ | _emojimodified_) _zwj_ `U+2764` _VS16_ _zwj_ `U+1F48B` _zwj_ (_emoji_ | _emojimodified_)
```
:::{figure-md}

ZWJ multi-person kiss sequence
:::
:::{figure-md}

ZWJ multi-person kiss sequence with modifier
:::
> Note: the kiss ZWJ sequence does not involve the "Kiss" codepoint
> (`U+1F48F`).
The "Couple holding hands" sequence is composed of
"Person,ZWJ,Handshake,ZWJ,Person", and must match:
```markdown
(_emoji_ | _emojimodified_) _zwj_ `U+1F91D` _zwj_ (_emoji_ | _emojimodified_)
```
:::{figure-md}

ZWJ multi-person holding-hands sequence
:::
:::{figure-md}

ZWJ multi-person holding-hands sequence with modifier
:::
> Note: the couple-holding-hands ZWJ sequence does not involve any of
> the "Man and woman holding hands" (`U+1F46B`), "Two men holding
> hands" (`U+1F46C`), or "Two women holding hands" (`U+1F46D`)
> codepoints.
The "Family" sequence is composed of two-to-four individual "Person"
subsequences, each separated by a ZWJ. Furthermore, the "Person"
subsequences must be sorted so that all adult subsequences precede all
child subsequences. A "Family" subsequence must match:
```markdown
(_emoji_ | _emojimodified_) (_zwj_ (_emoji_ | _emojimodified_) ){1,3}
```
> Note: The ZWJ "Family" sequence is defined to support modifiers on
> each individual human-codepoint component of the sequence, but these
> modified "Family" sequences are not currently included in the
> Recommended For General Interchange (RGI) emoji set, due to the
> number of permutations that would be added to the RGI set as a result.
:::{figure-md}

ZWJ multi-person family sequence "man, boy"
:::
:::{figure-md}

ZWJ multi-person family sequence "man, girl, girl"
:::
:::{figure-md}

ZWJ multi-person family sequence "man, woman, girl"
:::
:::{figure-md}

ZWJ multi-person family sequence "woman, woman, girl, boy"
:::
> Note: the family ZWJ sequence does not involve the "Family"
> codepoint (`U+1F46A`).
The "Shaking hands" sequence is composed of two "Hand" subsequences
separated by a ZWJ, and must match:
```markdown
`U+1FAF1` _modifier_ _zwj_ `U+1FAF2` _modifier_
```
:::{figure-md}

ZWJ multi-person shaking-hands sequence
:::
> Note: the ZWJ "Shaking hands" sequence does not involve the "Handshake"
> codepoint (`U+1F91D`), although the "Handshake" codepoint itself can
> be followed by a single _modifier_ codepoint that, for legacy
> reasons, serves to alter the skin tone of both of the hands depicted
> in the handshake.
>
> However, the "Handshake" codepoint _is_ utilized in the multi-person
> group sequence for "Couple holding hands".
#### ZWJ role sequences ####
A ZWJ role (or profession) sequence is used to request an emoji
depicting a human being performing a task or job. Role sequences are
composed of a codepoint representing a human, followed by "ZWJ",
followed by an emoji depicting an object or symbol that references the
desired profession or role.
:::{figure-md}

ZWJ role sequence 'firefighter'
:::
Optionally, the sequence can be further updated by requesting a
skin-tone modifier appended to the `_genderperson_` element.
:::{figure-md}

ZWJ role sequence 'firefighter' with modifier
:::
In some cases, the object or symbol depicted by the standalone emoji
will not be shown in the substituted emoji resulting from the
sequence. For example, the "factory" codepoint (`U+1F3ED`) depicts a
building in its standalone emoji, but the "factory worker" sequence
depicts a human being outfitted for factory work, rather than
depicting a combination of the human being and the factory building.
In addition, some of the supported "role" codepoints do not use emoji
presentation by default; for those codepoints, the emoji will be
followed by a presentation selector.
:::{figure-md}

ZWJ role sequence 'pilot'
:::
:::{figure-md}

ZWJ role sequence 'pilot' with modifier
:::
The fallback for a ZWJ role sequence is a generic "person" emoji
followed by the emoji symbolizing the task or job.
A ZWJ role sequence must match:
```markdown
_genderperson_ _modifier_? _zwj_ _emoji_ _presentation_?
```
> Note: The ZWJ role sequence is not to be confused with the ZWJ
> gendered person sequence.
>
> In effect, both sequence types can be used to depict a human being
> performing a task or activity, and can be used to request a specific
> gender for the human being depicted.
>
> However, all of the codepoints covered by the ZWJ gendered person
> sequences are emoji that show a human being by default, whereas the
> codepoints covered by the ZWJ role sequences begin with a generic
> human-being emoji and append a symbol or object emoji.
#### ZWJ color sequences ####
A ZWJ color sequence is used to request a version of an emoji
codepoint depicting the base object in a specific color.
A ZWJ color sequence must match:
```markdown
_emoji_ _zwj_ _color_ _presentation_
```
Currently, nine codepoints are defined, each of which (in isolation)
depicts a large square filled with the color in question.
- `U+2B1B` - "Black Large Square"
- `U+2B1C` - "White Large Square"
- `U+1F7E5` - "Large Red Square"
- `U+1F7E6` - "Large Blue Square"
- `U+1F7E7` - "Large Orange Square"
- `U+1F7E8` - "Large Yellow Square"
- `U+1F7E9` - "Large Green Square"
- `U+1F7EA` - "Large Purple Square"
- `U+1F7EB ` - "Large Brown Square"
:::{figure-md}

ZWJ color sequence
:::
The fallback for a ZWJ color sequence is the default emoji followed by
the default emoji for the color codepoint (that is, the color square).
#### ZWJ directionality sequences ####
A ZWJ directionality sequence is used to request a version of an emoji
codepoint facing a specific cardinal direction.
:::{figure-md}

ZWJ directionality sequence
:::
A ZWJ directionality sequence must match:
```markdown
_emoji_ _zwj_ _direction_ _presentation_
```
#### ZWJ additional sequences ####
In addition to the above ZWJ sequence categories, there are 13
standalone, but uncategorized, ZWJ sequences defined in Unicode.
- "Heart on fire"
- "Mending heart"
- "transgender flag"
- "Rainbow flag"
- "Pirate flag"
- "Service dog"
- "Polar bear"
- "Eye in speech bubble"
- "Face exhaling"
- "Face with spiral eyes"
- "Face in clouds"
- "Mx Claus"
- "Black cat"
:::{figure-md}

ZWJ heart-on-fire sequence
:::
These sequences currently match:
```markdown
_emoji_ _presentation_? _zwj_ _emoji_ _presentation_?
```
Note that the "Black cat" sequence, although it appears on this list
of additional ZWJ sequences, has subsequently been generalized to the
[ZWJ color sequence mechanism](#zwj-color-sequences).
### Other sequences and ligatures ###
Emoji fonts may include support for additional variants and
sequences. For example, an emoji font might implement support for
"keycap"-style emoji for alphabetical characters in addition to the
numbers and symbols defined above.
Emoji fonts may also include many-to-one emoji substitutions that do
not fit into any of the above sequence varieties and, instead, behave
more like ligatures. For example, the sequence "Ice Cream, Banana"
might be substituted with a "banana split" emoji.
Any such substitutions are included by the emoji font vendor at their
own discretion, with the understanding that fallback behavior is
unpredictable.
In all such cases, the shaping engine can make a best-effort attempt
to support the sequences, but is not obligated to provide any
guarantees as to their correctness.
## Feature interaction in sequences ##
As is noted in the descriptions of ZWJ gendered-person sequences and
ZWJ multi-person group sequences, there is potential for confusion
wherever standalone emoji codepoints and emoji sequences overlap in
meaning.
This potential for confusion is compounded by the fact that the
skin-tone modifier mechanism and the ZWJ gendered person mechanism
interact differently with standalone emoji codepoints and with emoji
sequences.
In particular, for several of the standalone emoji codepoints, a
single skin-tone modifier is permitted, which is defined to modify both
of the human beings depicted in the emoji. For other standalone emoji
codepoints, only a single gender designator ZWJ gendered-person
subsequence is allowed to be appended to the codepoint, and the gender
designator is defined to modify both of the human beings depicted in
the emoji.
The permitted combinations are summarized in the following table:
:::{table} Defined interactions between skin-tone–modifiers and gender designators
| Type | Emoji | Skin-tone-modifier | Gender depicted |
|:-----------|:----------------------------------------|:-------------------|:--------------------|
| Standalone | "Handshake" `U+1F91D` | only one supported | not supported |
| Standalone | "Woman with bunny ears" `U+1F46F` | not supported | only one supported |
| Standalone | "Wrestlers" `U+1F93C` | not supported | only one supported |
| Standalone | "Man and woman holding hands" `U+1F46B` | only one supported | not supported |
| Standalone | "Two men holding hands" `U+1F46C` | only one supported | not supported |
| Standalone | "Two women holding hands" `U+1F46D` | only one supported | not supported |
| Standalone | "Kiss" `U+1F48F` | only one supported | not supported |
| Standalone | "Couple with heart" `U+1F491` | only one supported | not supported |
| Standalone | "Family" `U+1F46A` | not supported | not supported |
| Sequence | "Couple with heart" ZWJ sequence | supported | supported |
| Sequence | "Couple in kiss" ZWJ sequence | supported | supported |
| Sequence | "Couple holding hands" ZWJ sequence | supported | supported |
| Sequence | "Family" ZWJ sequence | supported | supported |
| Sequence | "Shaking hands" ZWJ sequence | required | not supported |
:::
## Emoji sets ##
Unicode defines several lists of emoji codepoints and emoji sequences
that constitute the sequences that are expected in general text.
The "Basic emoji" set includes all individual codepoints that can be
rendered with the emoji presentation style (including those codepoints
that do not default to emoji presentation).
The "Emoji keycap sequence" set includes all possible valid Keycap
sequences.
The "RGI emoji modifier sequence", "RGI emoji flag sequence", "RGI
emoji tag sequence", and "RGI emoji ZWJ sequence" sets each include
only a subset of the possible valid sequences for their respective
variety of sequence. These sets are designated as "Recommended for
General Interchange" (RGI) to denote that they are in common usage.
Finally, the "RGI emoji set" includes all of the codepoints and
sequences included in the preceding sets. Presence in the RGI emoji
set can be tracked with the `RGI_Emoji` property in the UCD. Fonts are
not required to implement the entire RGI emoji set, nor any of the
other sets.
## The default shaping model ##
Emoji should be shaped using the
[default](opentype-shaping-default.md) shaping model.
Processing a run of text in the default shaping model involves three
top-level stages:
1. Applying the basic substitution features from GSUB
2. Applying other substitution features from GSUB
3. Applying the positioning features from GPOS
Emoji sequences as described above will generally be implemented in
the active font as a GSUB lookup feature. However, there are no
definitively invalid GSUB or GPOS features that must or must _not_ be
employed for this purpose.
Consequently, shaping engines should not assume (for example) that
emoji sequences will be implemented in any specific feature of GSUB.
A font may also employ contextual features, such as using `locl`, that
affects the emoji glyph shown, or use GPOS positioning for some emoji
glyphs.
### Font substitution for presentation forms ###
Before shaping begins, the rendering engine should analyze the text
run and identify presentation forms.
A presentation sequence is used to request a specific presentation
style ("text" or "emoji") for a codepoint, potentially overriding the
default presentation style that is defined in Unicode for that
codepoint.
Because it is uncommon for a single font to include both an
emoji-presentation-style glyph and a text-presentation-style glyph for
the same codepoint, handling a presentation sequence might
require font substitution.
> Note: Strictly speaking, font substitution is not part of the
> shaping process, and the handling of missing presentation forms
> might be most easily performed during segmentation of the text
> stream into runs. However, shaping-engine implementers should be
> aware that such presentation-sequence substitutions are allowable
> and handle them gracefully.
### 1. Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be tagged for possible application
of GSUB features.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
locl
ccmp
rlig
An emoji font can implement sequence support through any GSUB feature
lookup.
Basic substitution features a common choice for emoji fonts and should
be applied at this stage. In particular, GSUB features that are
enabled by default and GSUB features that cannot be disabled by
application-level user interfaces are common choices in which the
active font may implement emoji substitutions.
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
In other, non-emoji text runs, the `ccmp` feature allows a font to
substitute mark-and-base sequences with a pre-composed glyph including
the mark and the base, or to substitute a single glyph into an
equivalent decomposed sequence of glyphs.
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
The `rlig` feature substitutes glyph sequences with mandatory
ligatures. Substitutions made by `rlig` cannot be disabled by
application-level user interfaces.
The basic substitution features play a relatively more important role
in shaping non-emoji text runs; therefore the shaping engine may
apply some of them (such as `locl`) them at an earlier stage in the
shaping process. Emoji shaping should be unaffected by this decision.
### 2. Applying typographic substitution features from GSUB ###
The typographic-substitution phase applies all remaining substitution
features using the rules in the font's GSUB table. In preparation for
this stage, glyph sequences should be tagged for possible application
of GSUB features.
These substitutions include those features designed to provide
typographic consistency and correctness.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
An emoji font can implement sequence support through any GSUB feature
lookup. This can include any other substitution feature in the GSUB
feature table.
Support for RGI emoji sequences or other emoji sequences defined as
valid in Unicode may be implemented in a feature that are enabled by
default and cannot be disabled by application-level user interfaces,
such as the `rlig` feature (for "required ligatures").
However, emoji fonts may also include support for emoji sequences in
GSUB features that can be disabled by application-level user
interfaces, such as the `liga` feature (for standard ligatures). Emoji
sequences may also be implemented in features that are disabled by
default, such as the `dlig` feature (for "discretionary ligatures").
An emoji font might also implement support for emoji sequences through
the use of multiple features. For example, RGI emoji sequences or
other emoji sequences defined as valid in Unicode may be implemented
in `rlig`, with custom sequences implemented in `liga`.
### 3. Applying the positioning features from GPOS ###
The positioning stage adjusts the positions of mark and base
glyphs. In preparation for this stage, glyph sequences should be
tagged for possible application of GPOS features.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
In general, all emoji glyphs in a given font are expected to be
approximately equal in height and width, and the usage of GPOS
positioning for emoji is uncommon.
However, some emoji glyphs might be narrower or wider than average by
the nature of the image itself (for example, certain national flags
are narrower or wider than others), and there may be situations in
which the active font alters the default position of an emoji glyph to
achieve a consistent alignment, spacing, or appearance.
Therefore, shaping engines should make no assumptions about the
presence or absence of GPOS features for emoji runs, and should apply
the features if present.
================================================
FILE: opentype-shaping-gujarati.md
================================================
```{include} /_global.md
```
# Gujarati shaping in OpenType #
This document details the shaping procedure needed to display text
runs in the Gujarati script.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes and subclasses](#shaping-classes-and-subclasses)
- [Gujarati character tables](#gujarati-character-tables)
- [The `` shaping model](#the-gjr2-shaping-model)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The `` shaping model](#the-gujr-shaping-model)
- [Distinctions from ``](#distinctions-from-gjr2)
- [Advice for handling fonts with `` features only](#advice-for-handling-fonts-with-gujr-features-only)
- [Advice for handling text runs composed in `` format](#advice-for-handling-text-runs-composed-in-gujr-format)
## General information ##
The Gujarati script belongs to the Indic family, and follows
the same general patterns as the other Indic scripts. More
specifically, it belongs to the North Indic subgroup, in which
sequences of adjacent consonants are often represented as conjuncts.
The Gujarati script is used to write multiple languages, most commonly
Gujarati, Kutchi, and Avestan. In addition, Sanskrit may be written
in Gujarati, so Gujarati script runs may include glyphs from the Vedic
Extensions block of Unicode.
There are two extant Gujarati script tags defined in OpenType, ``
and ``. The older script tag, ``, was deprecated in 2005.
Therefore, new fonts should be engineered to work with the ``
shaping model. However, if a font is encountered that supports only
``, the shaping engine should deal with it gracefully.
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign.
**Halant** and **Virama** are both standard terms for the below-base "vowel-killer"
sign. Unicode documents use the term "virama" most frequently, while
OpenType documents use the term "halant" most frequently.
**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark
indicating that the preceding vowel should be nasalized.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Gujarati text depends on the shaping engine correctly
classifying each glyph in the run. As with most other scripts, the
classifications must distinguish between consonants, vowels
(independent and dependent), numerals, punctuation, and various types
of diacritical mark.
For most codepoints, the `General Category` property defined in the Unicode
standard is correct, but it is not sufficient to fully capture the
expected shaping behavior (such as glyph reordering). Therefore,
Gujarati glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes and subclasses ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark. The Vedic Extensions also include a `CANTILLATION`
class for tone marks.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (which are categorized as `Mark [Mn]`).
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType features that
might affect how the respective glyphs are drawn, such as `tnum`,
which specifies the usage of tabular-width numerals, and `sups`, which
replaces the default glyphs with superscript variants.
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the base character to which it is attached. The actual position of
the glyphs is determined by the lookups found in the font's GPOS
table, however, the shaping rules for Indic scripts require that the
shaping engine be able to identify marks by their general
position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Gujarati character tables ###
Separate character tables are provided for the Gujarati and Vedic
Extensions block as well as for other miscellaneous characters that
are used in `` text runs:
- [Gujarati character table](character-tables/character-tables-gujarati.md#gujarati-character-table)
- [Vedic Extensions character table](character-tables/character-tables-gujarati.md#vedic-extensions-character-table)
- [Miscellaneous character table](character-tables/character-tables-gujarati.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category, its shaping class, and its mark-placement subclass. The
codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} example character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0A81` | Mark [Mn] | BINDU | TOP_POSITION | ઁ Candrabindu |
| | | | |
|`U+0A95` | Letter | CONSONANT | _null_ | ક Ka |
:::
Codepoints with no assigned meaning are designated as _unassigned_ in
the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the tables use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Gujarati text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph".
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
As with other Indic scripts, the initial reordering stage and the
final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage five, however, do not have a mandatory order.
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The position of the base consonant in a syllable.
- The final position of "Reph".
- Whether "Reph" must be requested explicitly or if it is formed by
a specific, implicit sequence.
- Whether the below-base forms feature is applied only to consonants
before the syllable base, only to consonants after the base
consonant, or to both.
- The ordering positions for dependent vowels
(matras). Specifically, right-side, above-base, and below-base
matras follow different rules in different scripts.
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
With regard to these common variations, Gujarati's specific shaping
characteristics include:
- `BASE_POS_LAST` = The base consonant of a syllable is the last
consonant, not counting any special final-consonant forms.
- `REPH_POS_BEFORE_POST` = "Reph" is ordered before all post-base consonant forms.
- `REPH_MODE_IMPLICIT` = "Reph" is formed by an initial "Ra,Halant" sequence.
- `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to
pre-base consonants and to post-base consonants.
- `MATRA_POS_TOP` = `POS_AFTER_SUBJOINED` = Above-base matras are
ordered after subjoined (i.e., below-base) consonant forms.
- `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are
ordered after all post-base consonant forms.
- `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are
ordered after all post-base consonant forms.
These characteristics determine how the shaping engine must reorder
certain glyphs, how base consonants are determined, and how "Reph"
should be encoded within a run of text.
### Stage 1: Identifying syllables and other sequences ###
A syllable in Gujarati consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
> Note: The Gujarati Unicode block enumerates nine modifier signs,
> "Candrabindu" (`U+0A81`), "Anusvara" (`U+0A82`), "Visarga"
> (`U+0A83`), "Sukun" (`U+0AFA`), "Shadda" (`U+0AFB`), "Maddah"
> (`U+0AFC`), "Three-Dot Nukta Above" (`U+0AFD`), "Circle Nukta Above"
> (`U+0AFE`), and "Two-Circle Nukta Above" (`U+0AFF`). In addition,
> Sanskrit text written in Gujarati may include additional signs from
> Vedic Extensions block.
Each syllable contains exactly one vowel sound. Valid syllables may
begin with either a consonant or an independent vowel.
If the syllable begins with a consonant, then the consonant that
provides the vowel sound is referred to as the "base" consonant. If
the syllable begins with an independent vowel, that vowel is the
syllable's only vowel sound and, by definition, there is no "base"
consonant.
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
From the shaping engine's perspective, the main distinction between a
syllable with a base consonant and a syllable with an
independent-vowel base is that a syllable with an independent-vowel
base is less likely to include additional consonants in special forms
and less likely to include dependent vowel signs
(matras). Therefore, in the common case, vowel-based syllables may
involve less reordering, substitution feature applications, and other
processing than consonant-based syllables.
In some languages and orthographies, vowel-based syllables are
not permitted to include additional consonants or matras, and certain
GSUB substitution features do not occur. However, there are often
known exceptions, and real-world text makes no such guarantees.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
Generally speaking, the base consonant is the final consonant of the
syllable and its vowel sound designates the end of the syllable. This
rule is synonymous with the `BASE_POS_LAST` characteristic mentioned
earlier.
Valid consonant-based syllables may include one or more additional
consonants that precede the base consonant or syllable base. Each of these
other, pre-base consonants will be followed by the "Halant" mark, which
indicates that they carry no vowel. They affect pronunciation by
combining with the base consonant (e.g., "_str_", "_pl_") but they
do not add a vowel sound.
As with other Indic scripts, the consonant "Ra" receives special
treatment; in many circumstances it is replaced by one of two combining
mark-like forms.
- A "Ra,Halant" sequence at the beginning of a syllable
is replaced with an above-base mark called "Reph" (unless the "Ra"
is the only consonant in the syllable).
This rule is synonymous with the `REPH_MODE_IMPLICIT`
characteristic mentioned earlier.
- "Halant,Ra" sequences that occur elsewhere in the syllable may take on the
below-base form "Rakaar"."Reph" and "Rakaar" characters must be reordered after the
syllable-identification stage is complete.
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in the Gujarati script, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their categorization, using regular expressions.
The following general-purpose Indic-shaping regular expressions can be
used to match Gujarati syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
> Note: Primarily, this means moving dependent-vowel (matra) glyphs,
> "Ra,Halant" glyph sequences, and other consonants that take special
> treatment in some circumstances. "Ra" may take on one of two special
> forms, depending on its position in the syllable.
>
> These reordering moves are mandatory. The final-reordering stage
> may make additional moves, depending on the text and on the features
> implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. It includes some ordering categories not utilized in
Gujarati.
The basic positions (left to right) are "Reph"
(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants
positioned before the base consonant or syllable base
(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant
or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base
consonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final
consonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic
signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
categories matter only in that they are unambiguous.
For a definition of the "base" consonant, refer to stage 2, step 1, which
follows.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base, and it should be tagged
as `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a standalone sequence or other syllable that begins with a placeholder
or dotted circle, the placeholder or dotted circle will always serve
as the syllable base, and it should be tagged as
`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a syllable that begins with a consonant, the shaping engine must
determine the base consonant by a script-specific algorithm.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
The base consonant is defined as the consonant in a consonant-based
syllable that carries the syllable's vowel sound. That vowel sound
will either be provided by the script's inherent vowel (in which case
it is not written with a separate character) or the sound will be designated
by the addition of a dependent-vowel (matra) sign.
Vowel-based syllables, standalone-sequences, and broken text runs will
not have base consonants.
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reordering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
The algorithm for determining the base consonant is
- If the syllable starts with "Ra,Halant" and the syllable contains
more than one consonant, exclude the starting "Ra" from the list of
consonants to be considered.
- Starting from the end of the syllable, move backwards until a consonant is found.
* If the consonant is the first consonant, stop.
* If the consonant is preceded by the sequence "Halant,ZWJ", stop.
* If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`, then move to the previous consonant.
* If the consonant has a post-base form, tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If the consonant is a pre-base-reordering "Ra", tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If none of the above conditions is true, stop.
- The consonant stopped at will be the base consonant.
> Note: The algorithm is designed to work for all Indic
> scripts. However, Gujarati does not utilize pre-base-reordering "Ra".
Gujarati includes one below-base consonant form.
- "Halant,Ra" (occurring after the syllable base) and "Ra,Halant"
(before the syllable base, but in a non-syllable-initial
position) will take on the "Rakaar" form.
> Note: Because Gujarati employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search, only the "Halant,_consonant_"
> pattern following the syllable base for these below-base forms will
> be encountered. Stage 2, step 5 below ensures that the "_consonant_,Halant"
> pattern preceding the syllable base for these below-base forms will
> also be tagged correctly.
#### Stage 2, step 2: Matra decomposition ####
Second, any multi-part dependent vowels (matras) must be decomposed
into their components. Gujarati has one multi-part dependent vowel,
"Candra O" (`U+0AC9`).
> "Candra O" (`U+0AC9`) decomposes to "`U+0AC5`,`U+0ABE`"
> Note: "Candra O" is categorized in Unicode as being a top-and-right
> matra, a combination that would normally decompose into one
> TOP_POSITION mark and one RIGHT_POSITION mark. In "Candra O",
> however, the `U+0AC5` component is intended to be positioned over the
> `U+0ABE` component, not above the base.
>
> Consequently, the two decomposed components should both be tagged
> for the `POS_AFTER_POST` sorting position, and neither will need to
> be reordered.
>
> In addition, the decomposition is not canonical in
> Unicode. so performing the decomposition may trigger unknown
> behavior from other components of the software stack. Consequently,
> shaping engines may choose to skip it.
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-matra-decompose}
Two-part matra decomposition
:::
```{svg-color-toggle-button} gujarati-matra-decompose
```
#### Stage 2, step 3: Tag matras ####
Third, all left-side dependent-vowel (matra) signs must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.
Above-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_SUBJOINED`.
Right-side dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.
Below-base dependent-vowel (matra) signs must be tagged with `POS_AFTER_POST`.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
Gujarati includes one below-base consonant form.
- "Halant,Ra" (occurring after the syllable base) and "Ra,Halant"
(before the syllable base, but in a non-syllable-initial
position) will take on the "Rakaar" form.
> Note: Because Gujarati employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search in stage 2, step 1, any instances of the
> "Halant,_consonant_" pattern following the syllable base for these
> below-base forms will be encountered. The tagging in this step
> ensures that the "_consonant_,Halant" pattern preceding the syllable
> base for these below-base forms will also be tagged correctly.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" sequences that will become "Reph"s must be tagged with
`POS_RA_TO_BECOME_REPH`.
> Note: an initial "Ra,Halant" sequence will always become a "Reph"
> unless the "Ra" is the only consonant in the syllable.
#### Stage 2, step 7: Final consonants ####
Seventh, all final consonants must be tagged. Consonants that occur
after the base consonant or syllable base _and_ after a dependent vowel (matra) sign
must be tagged with `POS_FINAL_CONSONANT`.
> Note: Final consonants occur only in Sinhala and should not be
> expected in `` text runs. This step is included here to
> maintain compatibility across Indic scripts.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `rkrf` should match "_Consonant_,Halant,Ra" sequences
- `blwf` should match "Halant,Ra" in post-base positions and
"Ra,Halant" in non-initial pre-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `vatu` should match "_Consonant_,Halant,Ra" sequences
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf
pref (not used in Gujarati)
blwf
abvf (not used in Gujarati)
half
pstf
vatu
cjct
cfar (not used in Gujarati)
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
#### Stage 3, step 2: nukt ####
The `nukt` feature replaces "_Consonant_,Nukta" sequences with a
precomposed nukta-variant of the consonant glyph.
- The context defined for a `nukt` feature is:
:::{table} `nukt` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:------------------------------|:--------------|
| _none_ | `_consonant_`(full),`_nukta_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-nukt}
nukt feature application
:::
```{svg-color-toggle-button} gujarati-nukt
```
#### Stage 3, step 3: akhn ####
The `akhn` feature replaces two specific sequences with required ligatures.
- "Ka,Halant,Ssa" is substituted with the "KSsa" ligature.
- "Ja,Halant,Nya" is substituted with the "JNya" ligature.
These sequences can occur anywhere in a syllable. The "KSsa" and
"JNya" characters have orthographic status equivalent to full
consonants in some languages, and fonts may have `cjct` substitution
rules designed to match them in subsequences. Therefore, this
feature must be applied before all other many-to-one substitutions.
- The context defined for an `akhn` feature is:
:::{table} `akhn` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:----------------------------|:--------------|
| _none_ | `AKHAND_CONSONANT_SEQUENCE` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-akhn-kssa}
akhn KSsa formation
:::
```{svg-color-toggle-button} gujarati-akhn-kssa
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-akhn-jnya}
akhn JNya formation
:::
```{svg-color-toggle-button} gujarati-akhn-jnya
```
#### Stage 3, step 4: rphf ####
The `rphf` feature replaces initial "Ra,Halant" sequences with the
"Reph" glyph.
- An initial "Ra,Halant,ZWJ" sequence, however, must not be flagged for
the `rphf` substitution.
- The context defined for a `rphf` feature is:
:::{table} `rphf` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| `SYLLABLE_START` | "Ra"(full),`_halant_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-rphf}
Reph formation
:::
```{svg-color-toggle-button} gujarati-rphf
```
#### Stage 3, step 5: rkrf ####
The `rkrf` feature replaces "_Consonant_,Halant,Ra" sequences with the
"Rakaar"-ligature form of the consonant glyph.
- The context defined for a `rkrf` feature is:
:::{table} `rkrf` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------------|:----------------------|:--------------|
| `_consonant_`(full) | `_halant_`,"Ra"(full) | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-rkrf}
Rakaar ligation
:::
```{svg-color-toggle-button} gujarati-rkrf
```
#### Stage 3, step 6: pref ####
> This feature is not used in Gujarati.
#### Stage 3, step 7: blwf ####
The `blwf` feature replaces below-base-consonant glyphs with any
special forms. Gujarati includes one below-base consonant
form:
- "Halant,Ra" (occurring after the syllable base) or "Ra,Halant"
(before the syllable base, but in a non-syllable-initial position) will
take on the "Rakaar" form.
If the active font contains ligatures for the consonant adjacent to
the "Halant" (i.e., "_Consonant_,Halant,Ra"), then that ligature is
normally applied with the `rkrf` feature in stage 3, step 5. The `blwf`
feature allows the "Ra" to be substituted with a standalone "Rakaar"
mark, to work with all consonants that do not have a `rkrf` ligature
in the font.
Because Gujarati incorporates the `BLWF_MODE_PRE_AND_POST` shaping
characteristic, any pre-base consonants and any post-base consonants
may potentially match a `blwf` substitution; therefore, both cases must
be flagged for comparison. Note that this is not necessarily the case in other
Indic scripts that use a different `BLWF_MODE_` shaping
characteristic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-blwf}
blwf feature application
:::
```{svg-color-toggle-button} gujarati-blwf
```
#### Stage 3, step 8: abvf ####
> This feature is not used in Gujarati.
#### Stage 3, step 9: half ####
The `half` feature replaces "_Consonant_,Halant" sequences before the
base consonant or syllable base with "half forms" of the consonant
glyphs.
In the most common case, this substitution applies to
"_Consonant_,Halant" sequences that are followed by another
"_Consonant_".
In addition, a sequence matching "_Consonant_,Halant,ZWJ" must also be
flagged for potential `half` substitutions.
> Note: The presence of the "ZWJ" at the end of the sequence means
> that the sequence may match the regular-expression test in stage 1
> as the end of a syllable, even without being followed by a base
> consonant or syllable base.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
There are three exceptions to the default behavior, for which
the shaping engine must test:
- Initial "Ra,Halant" sequences, which should have been flagged for
the `rphf` feature earlier, must not be flagged for potential
`half` substitutions.
- Non-initial "Ra,Halant" sequences, which should have been flagged
for the `rkrf` or `blwf` features earlier, must not be flagged for
potential `half` substitutions.
- A sequence matching "_Consonant_,Halant,ZWNJ,_Consonant_" must not be
flagged for potential `half` substitutions.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-half}
half-form feature application
:::
```{svg-color-toggle-button} gujarati-half
```
#### Stage 3, step 10: pstf ####
> This feature is not used in Gujarati.
#### Stage 3, step 11: vatu ####
The `vatu` feature replaces certain sequences with "Vattu variant"
forms.
"Vattu variants" are formed from glyphs followed by "Rakaar"
(the below-base form of "Ra"); therefore, this feature must be applied after
the `blwf` feature.
> Note: for Gujarati, the `vatu` feature performs the same set of
> substitutions as the `rkrf` feature. The `rkrf` feature is
> preferred; if a given font implements `rkrf`, it does not
> necessarily have to implement `vatu`. Nevertheless, shaping engines
> must support and process both features.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-vatu}
vatu feature application
:::
```{svg-color-toggle-button} gujarati-vatu
```
#### Stage 3, step 12: cjct ####
The `cjct` feature replaces sequences of adjacent consonants with
conjunct ligatures. These sequences must match "_Consonant_,Halant,_Consonant_".
A sequence matching "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_" must not be flagged to form a conjunct.
> Note: The presence of the "ZWJ" in a
> "_Consonant_,Halant,ZWJ,_Consonant_" sequence should automatically
> inhibit any `cjct` feature rules from matching the sequence as valid
> input, and thus prevent the `cjct` substitution from being applied.
> Note: The presence of the "ZWNJ" in a
> "_Consonant_,Halant,ZWNJ,_Consonant_" sequence means that the
> "_Consonant_,Halant,ZWNJ" subsequence will match the
> regular-expression test in stage 1 as the end of a syllable.
>
> Because OpenType shaping features in `` are defined as
> applying only within an individual syllable, this means that the
> presence of the "ZWNJ" will automatically prevent the application of
> a `cjct` feature by triggering the identification of a syllable
> break between the two consonants.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWNJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
>
> Note, also: The presence of the "ZWJ" means that a
> "_Consonant_,Halant,ZWJ" sequence may match the regular-expression
> test in stage 1 as the end of a syllable, even without being
> followed by a base consonant or syllable base. By definition,
> however, a "_Consonant_,Halant,ZWJ" syllable identified in stage 1
> cannot also include a "_Consonant_" after the ZWJ.
The font's GSUB rules might be implemented so that `cjct`
substitutions apply to half-form consonants; therefore, this feature
must be applied after the `half` feature.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-cjct}
cjct feature application
:::
```{svg-color-toggle-button} gujarati-cjct
```
#### Stage 3, step 13: cfar ####
> This feature is not used in Gujarati.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-matra-position}
Pre-base matra positioning
:::
```{svg-color-toggle-button} gujarati-matra-position
```
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unnecessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. Because Gujarati incorporates the `REPH_POS_BEFORE_POST`
shaping characteristic, this final position is immediately before any
independent post-base consonant forms (meaning the first post-base
consonant that has not formed a ligature with the syllable base).
The algorithm for finding the final "Reph" position is
- Starting at the first post-"Reph" consonant, search forward looking
for the first explicit "Halant", ending the search when the base
consonant is encountered. If such an explicit "Halant" is found,
move the "Reph" to the position immediately after this
"Halant".
* If a zero-width joiner (ZWJ) or a zero-width non-joiner (ZWNJ)
follows this "Halant", move the "Reph" to the position
immediately after the ZWJ or ZWNJ. This will be the final
"Reph" position.
* If no ZWJ or ZWNJ follows this "Halant", leave the "Reph" in
its position immediately after the "Halant". This will be the
final "Reph" position.
- If no such explicit "Halant" is found in the previous step, find
the first post-base consonant that has not formed a ligature with
the base consonant. If such a non-ligated post-base consonant is
found, move the "Reph" to the position immediately before the
non-ligated post-base consonant. This will be the final "Reph"
position.
- If no such non-ligated post-base consonant is found in the
previous step, move the "Reph" to the position immediately before
the first post-base matra, syllable modifier, or Vedic sign that
has a positioning tag after the script's "Reph" position in the
syllable sort order (as listed in [stage
2](#stage-2-initial-reordering)). This will be the final "Reph"
position.
> Note: Because Gujarati incorporates the
> `REPH_POS_BEFORE_POST` shaping characteristic, this means
> any positioning tag of `POS_POSTBASE_CONSONANT` or later,
> although a post-base matra, syllable modifier, or Vedic sign
> would not typically be tagged with `POS_POSTBASE_CONSONANT`.
- If no other location has been located in the previous steps, move
the "Reph" to the end of the syllable.
Finally, if the final position of "Reph" occurs after a
"_matra_,Halant" subsequence, then "Reph" must be repositioned to the
left of "Halant", to allow for potential matching with `abvs` or
`psts` substitutions from GSUB.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-reph-position}
Reph positioning
:::
```{svg-color-toggle-button} gujarati-reph-position
```
#### Stage 4, step 4: Pre-base-reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
Gujarati does not use pre-base-reordering consonants, so this step will
involve no work when processing `` text. It is included here in order
to maintain compatibility with the other Indic scripts.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
Gujarati does not use the `init` feature, so this step will
involve no work when processing `` text. It is included here in
order to maintain compatibility with the other Indic scripts.
### Stage 5: Applying all remaining substitution features from GSUB ###
In this stage, the remaining substitution features from the GSUB table
are applied. In preparation for this stage, glyph sequences should be
flagged for possible application of GSUB features in stage 2,
step 10.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
init (not used in Gujarati)
pres
abvs
blws
psts
haln
The `init` feature is not used in Gujarati.
The `pres` feature replaces pre-base-consonant glyphs with special
presentations forms. This can include consonant conjuncts, half-form
consonants, and stylistic variants of left-side dependent vowels
(matras).
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-pres}
pres feature application
:::
```{svg-color-toggle-button} gujarati-pres
```
The `abvs` feature replaces above-base-consonant glyphs with special
presentation forms. This usually includes contextual variants of
above-base marks or contextually appropriate mark-and-base ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-abvs}
abvs feature application
:::
```{svg-color-toggle-button} gujarati-abvs
```
The `blws` feature replaces below-base-consonant glyphs with special
presentation forms. This usually includes replacing base consonants or
syllable bases that
are adjacent to the below-base-consonant form "Rakaar" with contextual
ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-blws}
blws feature application
:::
```{svg-color-toggle-button} gujarati-blws
```
The `psts` feature replaces post-base-consonant glyphs with special
presentation forms. This usually includes replacing right-side
dependent vowels (matras) with stylistic variants or replacing
post-base-consonant/matra pairs with contextual ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-psts}
psts feature application
:::
```{svg-color-toggle-button} gujarati-psts
```
The `haln` feature replaces syllable-final "_Consonant_,Halant" pairs with
special presentation forms. This can include stylistic variants of the
consonant where placing the "Halant" mark on its own is
typographically problematic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-haln}
haln feature application
:::
```{svg-color-toggle-button} gujarati-haln
```
> Note: The `calt` feature, which allows for generalized application
> of contextual alternate substitutions, is usually applied at this
> point. However, `calt` is not mandatory for correct Gujarati shaping
> and may be disabled in the application by user preference.
### Stage 6: Applying remaining positioning features from GPOS ###
In this stage, mark positioning, kerning, and other GPOS features are
applied.
As with the preceding stage, the order in which these features are
applied is not canonical; they should be applied in the order in which
they appear in the GPOS table in the font.
dist
abvm
blwm
> Note: The `kern` feature is usually applied at this stage, if it is
> present in the font. However, `kern` (like `calt`, above) is not
> mandatory for shaping Gujarati text and may be disabled by user preference.
The `dist` feature adjusts the horizontal positioning of
glyphs. Unlike `kern`, adjustments made with `dist` do not require the
application or the user to enable any software _kerning_ features, if
such features are optional.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-dist}
Distance feature application
:::
```{svg-color-toggle-button} gujarati-dist
```
The `abvm` feature positions above-base marks for attachment to base
characters. In Gujarati, this includes "Reph" in addition to
above-base dependent vowels (matras), diacritical marks, and Vedic signs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-abvm}
Above-base mark positioning
:::
```{svg-color-toggle-button} gujarati-abvm
```
The `blwm` feature positions below-base marks for attachment to base
characters. In Gujarati, this includes below-base dependent vowels
(matras) and diacritical marks as well as the below-base consonant form "Rakaar".
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gujarati-blwm}
Below-base mark positioning
:::
```{svg-color-toggle-button} gujarati-blwm
```
## The `` shaping model ##
The older Gujarati script tag, ``, has been deprecated. However,
shaping engines may still encounter fonts that were built to work with
`` and some users may still have documents that were written to
take advantage of `` shaping.
### Distinctions from `` ###
The most significant distinction between the shaping models is that the
sequence of "Halant" and consonant glyphs used to trigger shaping
features) was altered when migrating from `` to
``.
Specifically, shaping engines were expected to reorder post-base
"Halant,_Consonant_" sequences to "_Consonant_,Halant".
As a result, a font's GSUB substitutions would be written to match
"_Consonant_,Halant" sequences in all pre-base and post-base positions.
The `` syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In `` text, as described above in this document, there is no
such reordering. The correct sequence to match for GSUB substitutions is
"_Consonant_,Halant" for pre-base consonants, but "Halant,_Consonant_"
for post-base consonants.
The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, ``
was treated as if it followed the `BLWF_MODE_POST_ONLY`
characteristic. In other words, below-base form substitutions were
only applied to consonants after the base consonant or syllable base.
In addition, for some scripts, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For `` text, the left-side matra was always positioned
at the beginning of the syllable.
### Advice for handling fonts with `` features only ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences in order to apply GSUB substitutions when it is known that
the font in use supports only the `` shaping model.
### Advice for handling text runs composed in `` format ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences for GSUB substitutions or to reorder them to
"Halant,_Consonant_" when processing text runs that are tagged with
the `` script tag and it is known that the font in use supports
only the `` shaping model.
Shaping engines may also choose to apply `blwf` substitutions to
below-base consonants occurring before the base consonant or syllable base when it is
known that the font in use supports an applicable substitution lookup.
Shaping engines may also choose to position left-side matras according
to the `` ordering scheme; however, doing so might interfere
with matching GSUB or GPOS features.
================================================
FILE: opentype-shaping-gurmukhi.md
================================================
```{include} /_global.md
```
# Gurmukhi shaping in OpenType #
This document details the shaping procedure needed to display text
runs in the Gurmukhi script.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes and subclasses](#shaping-classes-and-subclasses)
- [Gurmukhi character tables](#gurmukhi-character-tables)
- [The `` shaping model](#the-gur2-shaping-model)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The `` shaping model](#the-guru-shaping-model)
- [Distinctions from ``](#distinctions-from-gur2)
- [Advice for handling fonts with `` features only](#advice-for-handling-fonts-with-guru-features-only)
- [Advice for handling text runs composed in `` format](#advice-for-handling-text-runs-composed-in-guru-format)
## General information ##
The Gurmukhi script belongs to the Indic family, and follows
the same general patterns as the other Indic scripts. More
specifically, it belongs to the North Indic subgroup, in which
sequences of adjacent consonants are often represented as conjuncts.
The Gurmukhi script is used to write multiple languages, most commonly
Punjabi, Sant Bhasha, and Sindhi. In addition, Sanskrit may be written
in Gurmukhi, so Gurmukhi script runs may include glyphs from the Vedic
Extensions block of Unicode.
There are two extant Gurmukhi script tags defined in OpenType, ``
and ``. The older script tag, ``, was deprecated in 2005.
Therefore, new fonts should be engineered to work with the ``
shaping model. However, if a font is encountered that supports only
``, the shaping engine should deal with it gracefully.
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign.
The term "matra" is also used to refer to the headline above most
Gurmukhi letters. To avoid ambiguity, the term **headline** is
used in most Unicode and OpenType shaping documents.
**Halant** and **Virama** are both standard terms for the below-base "vowel-killer"
sign. Unicode documents use the term "virama" most frequently, while
OpenType documents use the term "halant" most frequently.
**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark
indicating that the preceding vowel should be nasalized. In the Punjabi
language, this mark is known as the _adak bindi_.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Gurmukhi text depends on the shaping engine correctly
classifying each glyph in the run. As with most other scripts, the
classifications must distinguish between consonants, vowels
(independent and dependent), numerals, punctuation, and various types
of diacritical mark.
For most codepoints, the `General Category` property defined in the Unicode
standard is correct, but it is not sufficient to fully capture the
expected shaping behavior (such as glyph reordering). Therefore,
Gurmukhi glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes and subclasses ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark. The Vedic Extensions also include a `CANTILLATION`
class for tone marks.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (which are categorized as `Mark [Mn]`).
Gurmukhi uses one subclass of consonant, `CONSONANT_MEDIAL`.
> Note: Unicode includes a second subclass of consonant,
> `CONSONANT_PLACEHOLDER`, for two special vowel-carrier letters,
> "Iri" (`U+0A72`) and "Ura" (`U+0A73`). For shaping purposes, however,
> both "Iri" and "Ura" are classified as `CONSONANT`.
The `CONSONANT_MEDIAL` subclass is used for "Yakash" (`U+0A75`), a
consonant used in Sikh religious texts that is believed to be derived
from the character "Ya" (`U+0A2F`). "Yakash" is positioned in a mark-like,
below-base form, but it must pass tests for consonants when
identifying syllables.
Gurmukhi differs from many other Indic scripts in that independent
vowels are represented by the standard dependent-vowel marks (matras)
attached to a special vowel-carrier character. However, because each
independent vowel has been assigned its own codepoint by Unicode, the
standard `VOWEL_INDEPENDENT` and `VOWEL_DEPENDENT` classifications
function normally.
The vowel carrier "Aira", with no dependent-vowel mark, represents the
independent form of the inherent vowel, "A" (`U+0A05`). In a sense,
this character serves a double function.
The other two vowel carriers, "Iri" (`U+0A72`) and "Ura" (`U+0A73`)
do not normally occur on their own in Gurmukhi syllables, but they may
appear as standalone entities, much like marks and other symbols do
when they are referenced or displayed as examples. To support this use
case, the "Iri" and "Ura" characters have the status of consonants for
shaping purposes.
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType features that
might affect how the respective glyphs are drawn, such as `tnum`,
which specifies the usage of tabular-width numerals, and `sups`, which
replaces the default glyphs with superscript variants.
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the base character to which it is attached. The actual position of
the glyphs is determined by the lookups found in the font's GPOS
table, however, the shaping rules for Indic scripts require that the
shaping engine be able to identify marks by their general
position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Gurmukhi character tables ###
Separate character tables are provided for the Gurmukhi and Vedic
Extensions blocks as well as for other miscellaneous characters that
are used in `` text runs:
- [Gurmukhi character table](character-tables/character-tables-gurmukhi.md#gurmukhi-character-table)
- [Vedic Extensions character table](character-tables/character-tables-gurmukhi.md#vedic-extensions-character-table)
- [Miscellaneous character table](character-tables/character-tables-gurmukhi.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category, its shaping class, and its mark-placement subclass. The
codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} example character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0A01` | Mark [Mn] | BINDU | TOP_POSITION | ਁ Adak Bindi |
| | | | |
|`U+0A15` | Letter | CONSONANT | _null_ | ਕ Ka |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the tables use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Gurmukhi text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph".
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
> Note: "Reph" substitutions are rare in Gurmukhi text. `` fonts may
> not implement the "Reph" substitution in GSUB at all. Nevertheless,
> shaping engines must test for it in order to provide the
> functionality if it is implemented.
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
In addition to general punctuation, runs of Gurmukhi text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block.
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
As with other Indic scripts, the initial reordering stage and the
final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage five, however, do not have a mandatory order.
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The position of the base consonant in a syllable.
- The final position of "Reph".
- Whether "Reph" must be requested explicitly or if it is formed by
a specific, implicit sequence.
- Whether the below-base forms feature is applied only to consonants
before the syllable base, only to consonants after the base
consonant, or to both.
- The ordering positions for dependent vowels
(matras). Specifically, right-side, above-base, and below-base
matras follow different rules in different scripts.
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
With regard to these common variations, Gurmukhi's specific shaping
characteristics include:
- `BASE_POS_LAST` = The base consonant of a syllable is the last
consonant, not counting any special final-consonant forms.
- `REPH_POS_BEFORE_SUBJOINED` = "Reph" is ordered before all subjoined (i.e.,
below-base) consonant forms.
- `REPH_MODE_IMPLICIT` = "Reph" is formed by an initial "Ra,Halant" sequence.
- `BLWF_MODE_PRE_AND_POST` = The below-forms feature is applied both to
pre-base consonants and to post-base consonants.
- `MATRA_POS_TOP` = `POS_AFTER_POST` = above-base matras are
ordered after all post-base consonant forms.
- `MATRA_POS_RIGHT` = `POS_AFTER_POST` = Right-side matras are
ordered after all post-base consonant forms.
- `MATRA_POS_BOTTOM` = `POS_AFTER_POST` = Below-base matras are
ordered after all post-base consonant forms.
These characteristics determine how the shaping engine must reorder
certain glyphs, how base consonants are determined, and how "Reph"
should be encoded within a run of text.
### Stage 1: Identifying syllables and other sequences ###
A syllable in Gurmukhi consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
> Note: The Gurmukhi Unicode block enumerates six modifier signs,
> "Adak Bindi" (`U+0A01`), "Bindi" (`U+0A02`), "Visarga"
> (`U+0A03`), "Udaat" (`U+0A51`), "Tippi" (`U+0A70`), and "Addak"
> (`U+0A71`). In addition, Sanskrit text written in Gurmukhi may
> include additional signs from Vedic Extensions block.
Each syllable contains exactly one vowel sound. Valid syllables may
begin with either a consonant or an independent vowel.
If the syllable begins with a consonant, then the consonant that
provides the vowel sound is referred to as the "base" consonant. If
the syllable begins with an independent vowel, that independent vowel
is the syllable's only vowel sound and serves as the "base".
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
From the shaping engine's perspective, the main distinction between a
syllable with a base consonant and a syllable with an
independent-vowel base is that a syllable with an independent-vowel
base is less likely to include additional consonants in special forms
and less likely to include dependent vowel signs
(matras). Therefore, in the common case, vowel-based syllables may
involve less reordering, substitution feature applications, and other
processing than consonant-based syllables.
In some languages and orthographies, vowel-based syllables are
not permitted to include additional consonants or matras, and certain
GSUB substitution features do not occur. However, there are often
known exceptions, and real-world text makes no such guarantees.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
Generally speaking, the base consonant is the final consonant of the
syllable and its vowel sound designates the end of the syllable. This
rule is synonymous with the `BASE_POS_LAST` characteristic mentioned
earlier.
Valid consonant-based syllables may include one or more additional
consonants that precede the base consonant or syllable base. Each of these
other, pre-base consonants will be followed by the "Halant" mark, which
indicates that they carry no vowel. They affect pronunciation by
combining with the base consonant or syllable base (e.g., "_str_", "_pl_") but they
do not add a vowel sound.
Gurmukhi also includes special consonants that can occur after the
base consonant or syllable base. These post-base consonants will also be separated from
the base consonant or syllable base by a "Halant" mark; the algorithm for correctly
identifying the base consonant includes a test to recognize these sequences
and not mis-identify the base consonant.
As with other Indic scripts, the consonant "Ra" receives special
treatment; in many circumstances it is replaced by one of two combining
mark-like forms.
- A "Ra,Halant" sequence at the beginning of a syllable may be replaced
with an above-base mark called "Reph" (unless the "Ra" is the only
consonant in the syllable). This rule is synonymous with the
`REPH_MODE_IMPLICIT` characteristic mentioned earlier.
- A "Ra,Halant" sequence before the base consonant or syllable base or a "Halant,Ra"
sequence after the base consonant or syllable base may be replaced with a
below-base mark.
> Note: "Reph" substitutions are rare in Gurmukhi text. `` fonts may
> not implement the "Reph" substitution in GSUB at all. Nevertheless,
> shaping engines must test for it in order to provide the
> functionality if it is implemented.
"Reph" characters must be reordered after the syllable-identification
stage is complete.
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in the Gurmukhi script, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their categorization, using regular expressions.
The following general-purpose Indic-shaping regular expressions can be
used to match Gurmukhi syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
> Note: Primarily, this means moving dependent-vowel (matra) glyphs,
> "Ra,Halant" glyph sequences, and other consonants that take special
> treatment in some circumstances. This includes the below-base forms
> of "Ra", "Ha", and "Va".
>
> These reordering moves are mandatory. The final-reordering stage
> may make additional moves, depending on the text and on the features
> implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. It includes some ordering categories not utilized in
Gurmukhi.
The basic positions (left to right) are "Reph"
(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants
positioned before the base consonant or syllable base
(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant
or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base
consonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final
consonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic
signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
categories matter only in that they are unambiguous.
For a definition of the "base" consonant, refer to stage 2, step 1, which
follows.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base, and it should be tagged
as `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a standalone sequence or other syllable that begins with a placeholder
or dotted circle, the placeholder or dotted circle will always serve
as the syllable base, and it should be tagged as
`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a syllable that begins with a consonant, the shaping engine must
determine the base consonant by a script-specific algorithm.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
The base consonant is defined as the consonant in a consonant-based
syllable that carries the syllable's vowel sound. That vowel sound
will either be provided by the script's inherent vowel (in which case
it is not written with a separate character) or the sound will be designated
by the addition of a dependent-vowel (matra) sign.
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reordering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
The algorithm for determining the base consonant is
- If the syllable starts with "Ra,Halant" and the syllable contains
more than one consonant, exclude the starting "Ra" from the list of
consonants to be considered.
- Starting from the end of the syllable, move backwards until a consonant is found.
* If the consonant is the first consonant, stop.
* If the consonant is preceded by the sequence "Halant,ZWJ", stop.
* If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`, then move to the previous consonant.
* If the consonant has a post-base form, tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If the consonant is a pre-base-reordering "Ra", tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If none of the above conditions is true, stop.
- The consonant stopped at will be the base consonant.
> Note: The algorithm is designed to work for all Indic
> scripts. However, Gurmukhi does not utilize pre-base-reordering "Ra".
Gurmukhi includes one post-base form:
- "Halant,Ya" takes on a post-base form.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-pstf}
Post-base consonants
:::
```{svg-color-toggle-button} gurmukhi-pstf
```
Gurmukhi includes three below-base consonant forms:
- "Halant,Ra" (after the base consonant or syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Ha" (after the base consonant or syllable base) and "Ha,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Va" (after the base consonant or syllable base) and "Va,Halant" (in a
non-syllable-initial position) take on a below-base form.
Gurmukhi also includes the CONSONANT_MEDIAL subclass, used only for "Yakash"
(U+0A75), which is rendered as a below-base form. "Yakash" should
be tagged as `POS_BELOWBASE_CONSONANT`.
> Note: Because Gurmukhi employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search, only the "Halant,_consonant_"
> pattern following the syllable base for these below-base forms will
> be encountered. Stage 2, step 5 below ensures that the "_consonant_,Halant"
> pattern preceding the syllable base for these below-base forms will
> also be tagged correctly.
#### Stage 2, step 2: Matra decomposition ####
Second, any multi-part dependent vowels (matras) must be decomposed
into their left-side and right-side components. Gurmukhi has no
two-part dependent vowels, so this step will involve no work when
processing `` text. It is included here in order to maintain
compatibility with the other Indic scripts.
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
#### Stage 2, step 3: Tag matras ####
Third, all left-side dependent-vowel (matra) signs, including those that
resulted from the preceding decomposition step, must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.
All above-base dependent-vowel (matra) signs are tagged
`POS_AFTER_POST`.
All right-side dependent-vowel (matra) signs are tagged
`POS_AFTER_POST`.
All below-base dependent-vowel (matra) signs are tagged
`POS_AFTER_POST`.
For simplicity, shaping engines may choose to tag single-part matras
in an earlier text-processing step, using the information in the
_Mark-placement subclass_ column of the character tables. It is
critical at this step, however, that all decomposed matras are also
correctly tagged before proceeding to the next step.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
Gurmukhi includes three below-base consonant forms:
- "Halant,Ra" (after the base consonant or syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Ha" (after the base consonant or syllable base) and "Ha,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Va" (after the base consonant or syllable base) and "Va,Halant" (in a
non-syllable-initial position) take on a below-base form.
> Note: Because Gurmukhi employs the `BLWF_MODE_PRE_AND_POST` shaping
> characteristic, consonants with below-base special forms may occur
> before or after the syllable base.
>
> During the base-consonant search in 2.1, any instances of the
> "Halant,_consonant_" pattern following the syllable base for these
> below-base forms will be encountered. The tagging in this step
> ensures that the "_consonant_,Halant" pattern preceding the syllable
> base for these below-base forms will also be tagged correctly.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" sequences that will become "Reph"s must be tagged with
`POS_RA_TO_BECOME_REPH`.
> Note: an initial "Ra,Halant" sequence will always become a "Reph"
> unless the "Ra" is the only consonant in the syllable.
> Note: "Reph" substitutions are rare in Gurmukhi text. `` fonts may
> not implement the "Reph" substitution in GSUB at all. Nevertheless,
> shaping engines must test for it in order to provide the
> functionality if it is implemented.
#### Stage 2, step 7: Final consonants ####
Seventh, all final consonants must be tagged. Consonants that occur
after the syllable base _and_ after a dependent vowel (matra) sign
must be tagged with `POS_FINAL_CONSONANT`.
> Note: Final consonants occur only in Sinhala and should not be
> expected in `` text runs. This step is included here to
> maintain compatibility across Indic scripts.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `blwf` should match "Halant,Ra", "Halant,Ha", and "Halant,Va" in
post-base positions and "Ra,Halant", "Ha,Halant", and
"Va,Halant" in non-initial pre-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `pstf` should match initial "Halant,Ya" in post-base position
- `vatu` should match "_Consonant_,Halant,Ra",
"_Consonant_,Halant,Ha", and "_Consonant_,Halant,Va"
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf (not used in Gurmukhi)
pref (not used in Gurmukhi)
blwf
abvf (not used in Gurmukhi)
half
pstf
vatu
cjct
cfar (not used in Gurmukhi)
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
#### Stage 3, step 2: nukt ####
The `nukt` feature replaces "_Consonant_,Nukta" sequences with a
precomposed nukta-variant of the consonant glyph.
- The context defined for a `nukt` feature is:
:::{table} `nukt` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:------------------------------|:--------------|
| _none_ | `_consonant_`(full),`_nukta_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-nukt}
Nukta composition
:::
```{svg-color-toggle-button} gurmukhi-nukt
```
#### Stage 3, step 3: akhn ####
The `akhn` feature replaces two specific sequences with required ligatures.
- "Ka,Halant,Ssa" is substituted with the "KSsa" ligature.
- "Ja,Halant,Nya" is substituted with the "JNya" ligature.
These sequences can occur anywhere in a syllable. The "KSsa" and
"JNya" characters have orthographic status equivalent to full
consonants in some languages, and fonts may have `cjct` substitution
rules designed to match them in subsequences. Therefore, this
feature must be applied before all other many-to-one substitutions.
- The context defined for an `akhn` feature is:
:::{table} `akhn` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:----------------------------|:--------------|
| _none_ | `AKHAND_CONSONANT_SEQUENCE` | _none_ |
:::
> Note: Akhand ligatures are rare in Gurmukhi text. Nevertheless,
> shaping engines must test for the feature in order to provide the
> functionality if it is implemented.
#### Stage 3, step 4: rphf ####
The `rphf` feature replaces initial "Ra,Halant" sequences with the
"Reph" glyph.
- An initial "Ra,Halant,ZWJ" sequence, however, must not be flagged for
the `rphf` substitution.
- The context defined for a `rphf` feature is:
:::{table} `rphf` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| `SYLLABLE_START` | "Ra"(full),`_halant_` | _none_ |
:::
> Note: "Reph" usage is rare in Gurmukhi text. Nevertheless,
> shaping engines must test for the feature in order to provide the
> functionality if it is implemented.
#### Stage 3, step 5: rkrf ####
> This feature is not used in Gurmukhi.
#### Stage 3, step 6: pref ####
> This feature is not used in Gurmukhi.
#### Stage 3, step 7: blwf ####
The `blwf` feature replaces below-base-consonant glyphs with any
special forms. Gurmukhi includes three below-base consonant
forms:
- "Halant,Ra" (after the base consonant or syllable base) and "Ra,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Ha" (after the base consonant or syllable base) and "Ha,Halant" (in a
non-syllable-initial position) take on a below-base form.
- "Halant,Va" (after the base consonant or syllable base) and "Va,Halant" (in a
non-syllable-initial position) take on a below-base form.
Because Gurmukhi incorporates the `BLWF_MODE_PRE_AND_POST` shaping
characteristic, any pre-base consonants and any post-base consonants
may potentially match a `blwf` substitution; therefore, both cases must
be flagged for comparison. Note that this is not necessarily the case in other
Indic scripts that use a different `BLWF_MODE_` shaping
characteristic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-ra}
Below-base Ra composition
:::
```{svg-color-toggle-button} gurmukhi-blwf-ra
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-va}
Below-base Va composition
:::
```{svg-color-toggle-button} gurmukhi-blwf-va
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwf-ha}
Below-base Ha composition
:::
```{svg-color-toggle-button} gurmukhi-blwf-ha
```
#### Stage 3, step 8: abvf ####
> This feature is not used in Gurmukhi.
#### Stage 3, step 9: half ####
The `half` feature replaces "_Consonant_,Halant" sequences before the
base consonant or syllable base with "half forms" of the consonant
glyphs.
In the most common case, this substitution applies to
"_Consonant_,Halant" sequences that are followed by another
"_Consonant_".
In addition, a sequence matching "_Consonant_,Halant,ZWJ" must also be
flagged for potential `half` substitutions.
> Note: The presence of the "ZWJ" at the end of the sequence means
> that the sequence may match the regular-expression test in stage 1
> as the end of a syllable, even without being followed by a base
> consonant or syllable base.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
There are three exceptions to the default behavior, for which
the shaping engine must test:
- Initial "Ra,Halant" sequences, which should have been flagged for
the `rphf` feature earlier, must not be flagged for potential
`half` substitutions.
- Non-initial "Ra,Halant", "Ha,Halant", and "Va,Halant" sequences,
which should have been flagged for the `rkrf` or `blwf` features
earlier, must not be flagged for potential `half` substitutions.
- A sequence matching "_Consonant_,Halant,ZWNJ,_Consonant_" must not be
flagged for potential `half` substitutions.
> Note: Half forms are rare in Gurmukhi text. Fonts supporting
> `` may implement the `half` feature using explicit "Halant"
> glyphs, as illustrated here.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-half}
Half-form composition
:::
```{svg-color-toggle-button} gurmukhi-half
```
#### Stage 3, step 10: pstf ####
The `pstf` feature replaces post-base-consonant glyphs with any special forms.
Gurmukhi includes one post-base form:
- "Halant,Ya" takes on a post-base form.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-pstf-1}
Post-base Ya composition
:::
```{svg-color-toggle-button} gurmukhi-pstf-1
```
#### Stage 3, step 11: vatu ####
The `vatu` feature replaces certain sequences with "Vattu variant"
forms.
"Vattu variants" are formed from glyphs followed by the below-base
form of "Ra", "Ha", or "Va"; therefore, this feature must be applied after
the `blwf` feature.
> Note: vattu variants are rare in Gurmukhi text. Nevertheless,
> shaping engines must test for the feature in order to provide the
> functionality if it is implemented.
#### Stage 3, step 12: cjct ####
The `cjct` feature replaces sequences of adjacent consonants with
conjunct ligatures. These sequences must match "_Consonant_,Halant,_Consonant_".
A sequence matching "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_" must not be flagged to form a conjunct.
> Note: The presence of the "ZWJ" in a
> "_Consonant_,Halant,ZWJ,_Consonant_" sequence should automatically
> inhibit any `cjct` feature rules from matching the sequence as valid
> input, and thus prevent the `cjct` substitution from being applied.
> Note: The presence of the "ZWNJ" in a
> "_Consonant_,Halant,ZWNJ,_Consonant_" sequence means that the
> "_Consonant_,Halant,ZWNJ" subsequence will match the
> regular-expression test in stage 1 as the end of a syllable.
>
> Because OpenType shaping features in `` are defined as
> applying only within an individual syllable, this means that the
> presence of the "ZWNJ" will automatically prevent the application of
> a `cjct` feature by triggering the identification of a syllable
> break between the two consonants.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWNJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
>
> Note, also: The presence of the "ZWJ" means that a
> "_Consonant_,Halant,ZWJ" sequence may match the regular-expression
> test in stage 1 as the end of a syllable, even without being
> followed by a base consonant or syllable base. By definition,
> however, a "_Consonant_,Halant,ZWJ" syllable identified in stage 1
> cannot also include a "_Consonant_" after the ZWJ.
The font's GSUB rules might be implemented so that `cjct`
substitutions apply to half-form consonants; therefore, this feature
must be applied after the `half` feature.
> Note: Conjunct forms are rare in Gurmukhi text. Nevertheless,
> shaping engines must test for the feature in order to provide the
> functionality if it is implemented.
#### Stage 3, step 13: cfar ####
> This feature is not used in Gurmukhi.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-matra-position}
Pre-base matra positioning
:::
```{svg-color-toggle-button} gurmukhi-matra-position
```
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unnecessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. Because Gurmukhi incorporates the `REPH_POS_BEFORE_SUBJOINED`
shaping characteristic, this final position is defined to be
immediately after the syllable base and before any subjoined
(below-base consonant or below-base dependent vowel) forms.
The algorithm for finding the final "Reph" position is
- Starting at the first post-"Reph" consonant, search forward looking
for the first explicit "Halant", ending the search when the base
consonant is encountered. If such an explicit "Halant" is found,
move the "Reph" to the position immediately after this
"Halant".
* If a zero-width joiner (ZWJ) or a zero-width non-joiner (ZWNJ)
follows this "Halant", move the "Reph" to the position
immediately after the ZWJ or ZWNJ. This will be the final
"Reph" position.
* If no ZWJ or ZWNJ follows this "Halant", leave the "Reph" in
its position immediately after the "Halant". This will be the
final "Reph" position.
- If no such explicit "Halant" is found in the previous step, find
the first post-base consonant that has not formed a ligature with
the base consonant. If such a non-ligated post-base consonant is
found, move the "Reph" to the position immediately before the
non-ligated post-base consonant. This will be the final "Reph"
position.
- If no such non-ligated post-base consonant is found in the
previous step, move the "Reph" to the position immediately before
the first post-base matra, syllable modifier, or Vedic sign that
has a positioning tag after the script's "Reph" position in the
syllable sort order (as listed in [stage
2](#stage-2-initial-reordering)). This will be the final "Reph"
position.
> Note: Because Gurmukhi incorporates the
> `REPH_POS_BEFORE_SUBJOINED` shaping characteristic, this means
> any positioning tag of `POS_BELOWBASE_CONSONANT` or later,
> although a post-base matra, syllable modifier, or Vedic sign
> would not typically be tagged with `POS_BELOWBASE_CONSONANT`.
- If no other location has been located in the previous steps, move
the "Reph" to the end of the syllable.
Finally, if the final position of "Reph" occurs after a
"_matra_,Halant" subsequence, then "Reph" must be repositioned to the
left of "Halant", to allow for potential matching with `abvs` or
`psts` substitutions from GSUB.
#### Stage 4, step 4: Pre-base-reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
Gurmukhi does not use pre-base-reordering consonants, so this step will
involve no work when processing `` text. It is included here in order
to maintain compatibility with the other Indic scripts.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
Gurmukhi does not use the `init` feature, so this step will
involve no work when processing `` text. It is included here in
order to maintain compatibility with the other Indic scripts.
### Stage 5: Applying all remaining substitution features from GSUB ###
In this stage, the remaining substitution features from the GSUB table
are applied. In preparation for this stage, glyph sequences should be
flagged for possible application of GSUB features in stage 2,
step 10.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
init
pres
abvs
blws
psts
haln
The `init` feature replaces word-initial glyphs with special
presentation forms. Generally, these forms involve removing the
headline in-stroke from the left side of the glyph.
The `pres` feature replaces pre-base-consonant glyphs with special
presentations forms. This can include consonant conjuncts, half-form
consonants, and stylistic variants of left-side dependent vowels
(matras).
The `abvs` feature replaces above-base-consonant glyphs with special
presentation forms. This usually includes contextual variants of
above-base marks or contextually appropriate mark-and-base ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-abvs}
Above-base substitutions
:::
```{svg-color-toggle-button} gurmukhi-abvs
```
The `blws` feature replaces below-base-consonant glyphs with special
presentation forms. This usually includes replacing base consonant or syllable bases that
are followed by below-base-consonant forms (like those of "Ra", "Ha",
"Va", or "Yakash") with contextual ligatures.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blws}
Below-base substitutions
:::
```{svg-color-toggle-button} gurmukhi-blws
```
The `psts` feature replaces post-base-consonant glyphs with special
presentation forms. This usually includes replacing right-side
dependent vowels (matras) with stylistic variants or replacing
post-base-consonant/matra pairs with contextual ligatures.
The `haln` feature replaces word-final "_Consonant_,Halant" pairs with
special presentation forms. This can include stylistic variants of the
consonant where placing the "Halant" mark on its own is
typographically problematic.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-haln}
Halant form substitutions
:::
```{svg-color-toggle-button} gurmukhi-haln
```
> Note: The `calt` feature, which allows for generalized application
> of contextual alternate substitutions, is usually applied at this
> point. However, `calt` is not mandatory for correct Gurmukhi shaping
> and may be disabled in the application by user preference.
### Stage 6: Applying remaining positioning features from GPOS ###
In this stage, mark positioning, kerning, and other GPOS features are
applied.
As with the preceding stage, the order in which these features are
applied is not canonical; they should be applied in the order in which
they appear in the GPOS table in the font.
dist
abvm
blwm
> Note: The `kern` feature is usually applied at this stage, if it is
> present in the font. However, `kern` (like `calt`, above) is not
> mandatory for shaping Gurmukhi text and may be disabled by user preference.
The `dist` feature adjusts the horizontal positioning of
glyphs. Unlike `kern`, adjustments made with `dist` do not require the
application or the user to enable any software _kerning_ features, if
such features are optional.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-dist}
Application of the dist feature
:::
```{svg-color-toggle-button} gurmukhi-dist
```
The `abvm` feature positions above-base marks for attachment to base
characters. In Gurmukhi, this includes "Reph" in addition to the
diacritical marks and Vedic signs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-abvm}
Above-base mark positioning
:::
```{svg-color-toggle-button} gurmukhi-abvm
```
The `blwm` feature positions below-base marks for attachment to base
characters. In Gurmukhi, this includes below-base dependent vowels
(matras) as well as the below-base consonant forms of "Ra", "Ha", and
"Va".
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #gurmukhi-blwm}
Below-base mark positioning
:::
```{svg-color-toggle-button} gurmukhi-blwm
```
## The `` shaping model ##
The older Gurmukhi script tag, ``, has been deprecated. However,
shaping engines may still encounter fonts that were built to work with
`` and some users may still have documents that were written to
take advantage of `` shaping.
### Distinctions from `` ###
The most significant distinction between the shaping models is that the
sequence of "Halant" and consonant glyphs used to trigger shaping
features was altered when migrating from `` to
``.
Specifically, shaping engines were expected to reorder post-base
"Halant,_Consonant_" sequences to "_Consonant_,Halant".
As a result, a font's GSUB substitutions would be written to match
"_Consonant_,Halant" sequences in all pre-base and post-base positions.
The `` syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In `` text, as described above in this document, there is no
such reordering. The correct sequence to match for GSUB substitutions is
"_Consonant_,Halant" for pre-base consonants, but "Halant,_Consonant_"
for post-base consonants.
The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, ``
was treated as if it followed the `BLWF_MODE_POST_ONLY`
characteristic. In other words, below-base form substitutions were
only applied to consonants after the base consonant or syllable base.
In addition, for some scripts, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For `` text, the left-side matra was always positioned
at the beginning of the syllable.
### Advice for handling fonts with `` features only ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences in order to apply GSUB substitutions when it is known that
the font in use supports only the `` shaping model.
### Advice for handling text runs composed in `` format ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences for GSUB substitutions or to reorder them to
"Halant,_Consonant_" when processing text runs that are tagged with
the `` script tag and it is known that the font in use supports
only the `` shaping model.
Shaping engines may also choose to apply `blwf` substitutions to
below-base consonants occurring before the base consonant or syllable base when it is
known that the font in use supports an applicable substitution lookup.
Shaping engines may also choose to position left-side matras according
to the `` ordering scheme; however, doing so might interfere
with matching GSUB or GPOS features.
================================================
FILE: opentype-shaping-hangul.md
================================================
```{include} /_global.md
```
# Hangul script shaping in OpenType #
This document details the general shaping procedure shared by all
Hangul script styles, and defines the common pieces that style-specific
implementations share.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Jamo type](#jamo-type)
- [Composing behavior](#composing-behavior)
- [Character tables](#character-tables)
- [The `` shaping model](#the-hang-shaping-model)
- [Stage 1: Identifying syllables](#stage-1-identifying-syllables)
- [Stage 2: Determining if the syllable can be composed into a Hangul Syllables codepoint](#stage-2-determining-if-the-syllable-can-be-composed-into-a-hangul-syllables-codepoint)
- [Stage 3: Composing the syllable (if composition is possible)](#stage-3-composing-the-syllable-if-composition-is-possible)
- [Stage 4: Fully decomposing the syllable (if composition is not possible)](#stage-4-fully-decomposing-the-syllable-if-composition-is-not-possible)
- [Stage 5: Shaping the fully decomposed syllable with GSUB features](#stage-5-shaping-the-fully-decomposed-syllable-with-gsub-features)
- [Stage 6: Reordering tone marks](#stage-6-reordering-tone-marks)
## General information ##
The Hangul script is used to write Korean. It may also be referred to
as the Choseongul script or Jeongum script, and is in use in both
North Korea and South Korea as well as regions within China. It may
also be used to write the Cia-Cia language in Indonesia.
Hangul syllables are formed from individual alphabetic letters that
are arranged into square cells using pre-defined patterns. The
syllables themselves are monospaced in a run of text, using interword
spacing and punctuation.
Korean text may, in practice, incorporate Chinese characters ("Hanja")
in addition to Hangul. Hanja characters are not affected by the
shaping model for Hangul.
Modern Korean text is typically written (and, therefore, rendered)
left to right. Classical and older texts, however, may be written
vertically, top to bottom.
## Terminology ##
OpenType shaping uses a standard set of terms for elements of the
Hangul script. The terms used colloquially in any particular language
or country may vary, however, potentially causing confusion.
**Jamo** characters are the fundamental letters from which syllable
blocks are constructed. There are three classes of jamo:
- **L**eading consonants (choseong)
- **V**owels (jungseong)
- **T**railing consonants (jongseong)
Most, but not all, of the basic consonant letters can appear either in
leading or in trailing form. Nevertheless, the leading and trailing forms are
assigned distinct codepoints in Unicode. In addition, the set of valid
trailing consonants includes several compound-consonant pairs that can
never occur in leading form.
Old Korean featured a considerable number of additional jamo, which
are also defined in Unicode. Many of these Old Korean jamo are
compound forms that concatenate two or three basic jamo.
A **syllable** is formed by arranging a sequence of jamo into its
appropriate square-cell form. The horizontal and vertical positioning
of each jamo in the cell depends on the content of the syllable. The
exact shape and proportions of each jamo will also vary with its final
position in the cell.
Valid syllables must be either of the form "**`L`**,**`V`**" or of the form
"**`L`**,**`V`**,**`T`**". That is, each syllable must begin with one leading
consonant, must include one vowel in the second position, and may or may
not end with one trailing consonant.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-lv-syllable}
LV syllable
:::
```{svg-color-toggle-button} hangul-lv-syllable
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-lvt-syllable}
LVT syllable
:::
```{svg-color-toggle-button} hangul-lvt-syllable
```
All possible syllables for Modern Korean are defined in the Hangul
Syllables block of Unicode. A sequence of individual jamo codepoints
that corresponds to a valid Modern Korean syllable can therefore be
**composed** into a syllable codepoint.
Sequences of codepoints that involve Old Korean jamo cannot be
composed into syllable codepoints and are handled separately by the
shaping engine.
Two tone marks are common in Old Korean, the **single-dot bangjeom**
and the **double-dot bangjeom**. Both bangjeom marks are rendered to
the left of the syllable to which they are applied.
## Glyph classification ##
Proper shaping of Hangul text runs involves determining when
sequences of jamo can be composed into syllable codepoints that are included in
the active font — in which case they should be replaced by the corresponding
syllable glyph — and when they cannot.
Those jamo sequences that cannot be composed into a syllable codepoint
(or that compose into a syllable codepoint that is missing in the
active font) are then rendered by shaping and positioning each
individual jamo using GSUB substitution rules.
### Jamo type ###
Each Hangul jamo is assigned a `JAMO_TYPE` property that indicates whether
it is a leading consonant (`L`), a vowel (`V`), or a trailing
consonant (`T`).
Most, but not all, of the basic consonant letters can appear either in
leading or in trailing form. Nevertheless, the leading and trailing
forms are assigned distinct codepoints in Unicode. In addition, the
set of valid trailing consonants includes several compound consonant
pairs that can never occur in leading form.
For example, the basic consonant "Kiyeok" (ᄀ) is encoded as `U+1100`
in its leading (choseong) form but as `U+11A8` in its trailing
(jongseong) form. The tense or emphatic form of the consonant,
"Ssangkiyeok" (ᄁ), is encoded in its leading (choseong) form as
`U+1101` but in its trailing (jongseong) form as `U+11A9`, and is
rendered visually as a doubled version of the basic consonant.
In addition, two compound trailing consonants, "Kiyeok-sios" (ᆪ
`U+11AA`) and "Rieul-kiyeok" (ᆰ `U+11B0`), also incorporate the
Kiyeok basic consonant. But Kiyeok-sios and Rieul-kiyeok are never
used as leading consonants, therefore they are not encoded in leading
(choseong) forms.
> Note: compound consonant jamo are not written as sequences of basic
> jamo. That is, "Kiyeok,Kiyeok" (ᄀᄀ) is not equivalent
> to "Ssangkiyeok" (ᄁ).
The Hangul Jamo block also includes two "filler" codepoints. "Choseong
Filler" (`U+115F`) can take the place of a missing choseong (`L`
consonant), and "Jungseong Filler" (`U+1160`) can take the place of a
missing jungseong (`V` vowel). For shaping purposes, the fillers are
classified as type `Lf` and type `Vf`, respectively.
### Composing behavior ###
Modern Korean features 19 leading consonants (`L` forms), 21 vowels
(`V` forms), and 27 trailing consonants (`T` forms).
Old Korean featured a considerable number of additional jamo, which
are also defined in Unicode. Some of these Old Korean jamo are
distinct basic letters that are no longer used in Modern Korean. Many
others are compound forms that concatenate two or even three basic jamo.
The Hangul Syllables block in Unicode only includes those syllables
that contain solely Modern jamo. Consequently, each jamo is assigned a
`COMPOSING_BEHAVIOR` property to indicate whether it can be composed
into a Hangul Syllable codepoint.
An "`L`,`V`,`T`" sequence with the `COMPOSING_BEHAVIOR`s
"`YES`,`YES`,`YES`" or an "`L`,`V`" sequence with the
`COMPOSING_BEHAVIOR`s "`YES`,`YES`" will compose to a codepoint in the Hangul
Syllables block. A sequence containing any `NO`s will not compose to a
codepoint in the Hangul Syllables block.
> Note: the jamo filler codepoints are both designated with the
> `COMPOSING_BEHAVIOR` of `NO`.
### Character tables ###
Separate character tables are provided for the Hangul Jamo, Hangul
Jamo Extended-A, and Hangul Jamo Extended-B blocks, as well as for other miscellaneous
characters that are used in `` text runs:
- [Hangul Jamo character table](character-tables/character-tables-hangul.md#hangul-jamo-character-table)
- [Hangul Jamo Extended-A character table](character-tables/character-tables-hangul.md#hangul-jamo-extended-a-character-table)
- [Hangul Jamo Extended-B character table](character-tables/character-tables-hangul.md#hangul-jamo-extended-b-character-table)
- [Miscellaneous character table](character-tables/character-tables-hangul.md#miscellaneous-character-table)
The Hangul Jamo block contains all of the Modern Korean jamo, the two
jamo fillers, and the most common Old Korean jamo.
The Hangul Jamo Extended-A block contains additional `L` (choseong)
jamo for Old Korean. The Hangul Jamo Extended-B block contains
additional `V` (jungseong) and `T` (jongseong) jamo for Old Korean.
The Hangul Syllables block contains all of the valid permutations of the
Modern Korean jamo. Each syllable codepoint can be classified by
syllable type, either `LV` or `LVT`. These types are synonymous with
the "Hangul Syllable Type" property in Unicode. Due to the size of the
Hangul Syllables block, a full character table is not
provided. However, a
[summary](character-tables/character-tables-hangul.md#hangul-syllables-character-table)
is included to show the ranges of `LV` and `LVT` syllables.
Unicode also defines a Hangul Compatibility Jamo block that implements
backward compatibility with a retired file-encoding format. Unless a
software application is required to support specific stores of
documents that are known to have used the older encoding, however, the
shaping engine should not be expected to handle any text runs
incorporating codepoints from this block.
The tables list each codepoint along with its Unicode general
category, its jamo type, and its composing behavior. The codepoint's
Unicode name and an example glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Jamo type | Composing | Glyph |
|:----------|:-----------------|:----------|:----------|:---------------------------------|
|`U+1109` | Letter | L | YES | ᄉ Sios |
| | | | | |
|`U+1182` | Letter | V | NO | ᆂ O-O |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
In addition to general punctuation, runs of Hangul text may use
punctuation marks from the CJK Symbols And Punctuation block.
Of particular note are the single-dot tone mark (single-dot bangjeom)
and double-dot tone mark (double-dot bangjeom), `U+302E` and
`U+302F`. These non-spacing marks are common in Old Korean.
Other important characters that may be encountered when shaping runs
of Hangul text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`), and zero-width non-joiner (`U+200C`).
The dotted-circle placeholder is frequently used when displaying a
mark in isolation. Real-world text may also use other characters, such
as hyphens or dashes, in a similar placeholder fashion; shaping
engines should cope with this situation gracefully.
The zero-width space (`U+200B`) or word joiner (`U+2060`) may be used
between two jamo to prevent them from being conjoined into a
syllable. The zero-width space allows a line break to happen between
the jamo, while the word joiner prevents the jamo from being separated
by a line break.
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables
2. Determining if the syllable can be composed into a Hangul Syllables codepoint
3. Composing the syllable (if composition is possible)
4. Fully decomposing the syllable (if composition is not possible)
5. Shaping the fully decomposed syllable with GSUB features
6. Reordering tone marks
### Stage 1: Identifying syllables ###
The precomposed syllable codepoints in the Hangul Syllable block come in
two forms: `LV` syllables (which represent an `L` jamo and a `V` jamo)
and `LVT` syllables (which represent an `L` jamo, a `V` jamo, and a `T` jamo).
A syllable consisting of a string of jamo must match either the
sequence "`L`,`V`" or the sequence "`L`,`V`,`T`".
The `L`, `V`, and `T` components must be a single jamo each. In Modern
Korean, all of the jamo must have a `COMPOSING_BEHAVIOR` of `YES`. In
Old Korean, `YES` and `NO` are both acceptable for
`COMPOSING_BEHAVIOR`.
However, real-world input can also include syllables entered as a
precomposed `LV` Hangul Syllable codepoint followed by a standalone
`T` jamo.
Syllables in a text run can therefore be identified with the following
regular expression:
```
Slvt | +> + [T]
```
L L jamo
V V jamo
T T jamo
Lf L jamo filler
Vf V jamo filler
Slv Precomposed "LV" syllable
Slvt Precomposed "LVT" syllable
[ ] optional occurence of the enclosed expression
<|> one of the options separated by the vertical bar
The expression matches five possible syllable types:
- `Slvt`
- `Slv`
- `Slv`,`T`
- `L`,`V`
- `L`,`V`,`T`
Sequences of jamo that do not match the above expression should be
treated as runs of standalone jamo letters.
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Determining if the syllable can be composed into a Hangul Syllables codepoint ###
#### Stage 2, step 1: Fully precomposed syllables ####
A precomposed `Slvt` or `Slv` syllable requires no shaping if the active
font includes a glyph for the corresponding Hangul Syllables
codepoint. If the glyph is present, the shaping engine can render it
and proceed directly to stage six without further work. If the glyph
is not present, the shaping engine must proceed to stage four.
The other syllable types involve jamo, and each syllable must be
examined to determine if it composes into a codepoint in the Hangul
Syllables block.
#### Stage 2, step 2: Partially precomposed syllables ####
For "`Slv`,`T`" syllables, the `Slv` codepoint must first be
decomposed into its constituent jamo. Then, the resulting
"`L`,`V`,`T`" syllable must be examined in the [next
step](#stage-2-step-3-fully-jamo-syllables).
The decomposition of the `Slv` syllable is canonical, and uses the
algorithm defined in [stage four](#stage-4-fully-decomposing-the-syllable-if-composition-is-not-possible).
#### Stage 2, step 3: Fully jamo syllables ####
For "`L`,`V`" and "`L`,`V`,`T`" syllables, the `COMPOSING_BEHAVIOR` of
each jamo must be examined.
If all jamo in the syllable have `COMPOSING_BEHAVIOR` of `YES`, then
the shaping engine should proceed to stage three and attempt to
compose the jamo into the corresponding Hangul Syllables codepoint.
If any of the jamo in the syllable have `COMPOSING_BEHAVIOR` of `NO`,
then the shaping engine should proceed to stage five and shape the
syllable using GSUB features.
### Stage 3: Composing the syllable (if composition is possible) ###
Unicode defines a canonical algorithm for composing jamo into Hangul
Syllables codepoints. The algorithm leverages the strict jamo-ordering
of the syllables in the block to provide an algebraic method to
determine the codepoint of a syllable using the codepoints of its
constituent `L`, `V`, and (if needed) `T` jamo as input.
The algorithm defines the following consonants:
```
SBase = AC00
LBase = 1100
VBase = 1161
TBase = 11A7
LCount = 19
VCount = 21
TCount = 28
NCount = (VCount * TCount) = 588
SCount = (LCount * NCount) = 11172
```
For a jamo sequence "`L`,`V`", where both `L` and `V` are of
`COMPOSING_BEHAVIOR` `YES`, the composed syllable codepoint is found
by computing:
```
LIndex = L - LBase
VIndex = V - VBase
LVIndex = LIndex * NCount + VIndex * TCount
Slv = SBase + LVIndex
```
Similarly, for a jamo sequence "`L`,`V`,`T`", where `L`, `V`, and `T`
are all of `COMPOSING_BEHAVIOR` `YES`, the composed syllable codepoint
is found by computing:
```
LIndex = L - LBase
VIndex = V - VBase
TIndex = T - TBase
LVIndex = LIndex * NCount + VIndex * TCount
Slvt = SBase + LVIndex + TIndex
```
After the syllable codepoint has been found, the shaping engine must
verify that the codepoint's glyph exists in the active font. If the
glyph is present, the shaping engine must substitute the input jamo
sequence with the glyph. The shaping engine can then proceed to stage
six.
If the needed codepoint is missing, the shaping engine should perform
no substitution and must proceed to stage five with the original `L`,
`V`, and (if used) `T` jamo.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-compose}
Syllable composition
:::
```{svg-color-toggle-button} hangul-compose
```
### Stage 4: Fully decomposing the syllable (if composition is not possible) ###
An "`Slv`,`T`" syllable that does not compose into a Hangul Syllables
codepoint or that composes into a Hangul Syllables codepoint which is
missing in the active font must be fully decomposed into jamo.
Similarly, a precomposed `Slvt` or `Slv` syllable requires no shaping
if the active font includes a glyph for the corresponding Hangul
Syllables codepoint. If the corresponding codepoint is missing in the
active font, however, the syllable must be fully decomposed into jamo.
Unicode defines a canonical algorithm for decomposing Hangul Syllables
codepoints into constituent jamo. The algorithm leverages the strict
jamo-ordering of the syllables in the block to provide an algebraic method to
determine the codepoints of a syllable's `L`, `V`, and (if needed) `T`
jamo from the syllable's codepoint.
The algorithm defines the following consonants:
```
SBase = AC00
LBase = 1100
VBase = 1161
TBase = 11A7
LCount = 19
VCount = 21
TCount = 28
NCount = (VCount * TCount) = 588
SCount = (LCount * NCount) = 11172
```
For a syllable codepoint S, the codepoints of the constituent `L`,
`V`, and `T` jamo are found by computing:
```
SIndex = S - SBase
LIndex = SIndex div NCount
VIndex = (SIndex mod NCount) div TCount
TIndex = SIndex mod TCount
L = LBase + LIndex
V = VBase + VIndex
T = TBase + TIndex if TIndex > 0
```
If `TIndex` = 0, then the syllable has no `T` jamo in the
trailing-consonant (jongseong) position.
With the syllable decomposed, the shaping engine can proceed to stage
five with the `L`, `V`, and (if used) `T` jamo.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-decompose}
Syllable decomposition
:::
```{svg-color-toggle-button} hangul-decompose
```
### Stage 5: Shaping the fully decomposed syllable with GSUB features ###
With the syllable fully decomposed into a sequence of jamo, the next
stage applies mandatory substitution features using rules in the
font's GSUB table.
#### Stage 5, step 1: `ccmp` ####
The `ccmp` feature allows a font to substitute basic-jamo sequences
with a pre-composed glyph including compound jamo.
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
#### Stage 5, step 2: `ljmo` ####
This feature replaces the default (i.e., standalone) forms of leading
consonant (choseong) glyphs in a syllable cell with alternate forms
that fit into syllable-appropriate positions.
The appropriate shape of the choseong glyph depends on the shape of
the vowel (jungseong) that follows. For example, a tall jungseong forces
the usage of a tall choseong form.
In addition, if the syllable ends in a trailing consonant (jongseong),
then shorter forms of both the leading consonant (choseong) and vowel
(jungseong) glyphs will be used in order to provide sufficient
vertical space.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-ljmo}
L Jamo feature application
:::
```{svg-color-toggle-button} hangul-ljmo
```
#### Stage 5, step 3: `vjmo` ####
This feature replaces the default (i.e., standalone) forms of vowel
(jungseong) glyphs in a syllable cell with alternate forms that fit into
syllable-appropriate positions.
The appropriate shape of the jungseong glyph depends on the presence
or absence of a trailing consonant (jongseong) at the end of the syllable.
If the syllable ends in a trailing consonant (jongseong), then shorter
forms of both the leading consonant (choseong) and vowel (jungseong)
glyphs will be used in order to provide sufficient vertical space.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-vjmo}
V Jamo feature application
:::
```{svg-color-toggle-button} hangul-vjmo
```
#### Stage 5, step 4: `tjmo` ####
This feature replaces the default (i.e., standalone) forms of trailing
consonant (jongseong) glyphs in a syllable cell with alternate forms
that fit into syllable-appropriate positions.
Because jongseong jamo are always preceded by a choseong jamo and a
jungseong jamo, there is less variation in shape that the alternate
forms can take on. A given font may, however, include several
context-dependent alternates for stylistic or typographic variation.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-tjmo}
T Jamo feature application
:::
```{svg-color-toggle-button} hangul-tjmo
```
### Stage 6. Reordering tone marks ###
Any tone marks should now be reordered. In the text run, marks occur immediately after
the syllable to which they apply. After reordering, each mark should
be placed immediately to the left of the syllable.
This reordering move is the same regardless of whether the syllable in
question is a precomposed syllable codepoint from the Hangul Syllables
block or a jamo-based syllable composed via the application of GSUB
features. Therefore, the reordering must take place at the end of the
shaping process.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hangul-tone}
Tone-mark reordering
:::
```{svg-color-toggle-button} hangul-tone
```
================================================
FILE: opentype-shaping-hebrew.md
================================================
```{include} /_global.md
```
# Hebrew script shaping in OpenType #
This document details the general shaping procedure shared by all
Hebrew script styles, and defines the common pieces that style-specific
implementations share.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Mark classification](#mark-classification)
- [Character tables](#character-tables)
- [The `` shaping model](#the-hebr-shaping-model)
- [Stage 1: Compound character composition and decomposition](#stage-1-compound-character-composition-and-decomposition)
- [Stage 2: Composing any Alphabetic Presentation forms](#stage-2-composing-any-alphabetic-presentation-forms)
- [Stage 3: Applying the language-form substitution features from GSUB](#stage-3-applying-the-language-form-substitution-features-from-gsub)
- [Stage 4: Applying the typographic-form substitution features from GSUB](#stage-4-applying-the-typographic-form-substitution-features-from-gsub)
- [Stage 5: Applying the positioning features from GPOS](#stage-5-applying-the-positioning-features-from-gpos)
## General information ##
The Hebrew script is used to write multiple languages, including
Hebrew, Yiddish, and Judezmo.
Hebrew is written (and, therefore, rendered) from right to
left. Shaping engines must track the directionality of the text run
when scripts of different direction are mixed.
The Hebrew script tag defined in OpenType is ``. Apart from the
fact that Hebrew uses right-to-left directionality, the shaping
process for `` is identical to the default script-shaping
model.
> Note: The Letterlike Symbols block in Unicode includes four
> codepoints corresponding to mathematical symbols based on Hebrew
> letters.
>
> These codepoints are not expected to occur within valid Hebrew text
> runs. In addition, because these codepoints are defined for usage in
> mathematical expressions, they are designated as using left-to-right
> directionality.
## Terminology ##
OpenType shaping uses a standard set of terms for elements of the
Hebrew script. The terms used colloquially in any particular language
may vary, however, potentially causing confusion.
**Base** glyph or character is the standard term for a Hebrew
character that is capable of taking a diacritical mark.
Most of the base characters in Hebrew are consonants, although some
base characters are used to represent vowels in certain contexts.
Vowels that are not represented with base characters are frequently
omitted from the text run entirely. Alternatively, such vowels may
appear as marks called **niqqud**. Niqqud are also referred to as
**points** in the Unicode standard.
Pronunciation marks, such as the dot used to distinguish "Shin" from
"Sin" are also considered **niqqud**. Niqqud are typically positioned
above or below the base character.
**Dagesh** is the term for a particular diacritic that alters the
pronunciation of a consonant. The dagesh is distinctive for being
positioned inside the consonant glyph. Other Hebrew diacritics are
positioned either above or below the base character.
Hebrew also includes a sizable set of **cantillation marks**, in
addition to vowel, diacritical, and pronunciation marks. Cantillation
marks are also referred to as **tropes**.
## Glyph classification ##
Because `` text runs do not involve reordering or syllable
identification, Hebrew base characters do not require further
classification for script-shaping purposes.
Five Hebrew letters have special word-final forms. Each of these is
encoded separately in the Hebrew block. They are regarded as
contextual variants, not as distinct letters. The Hebrew block also
includes several digraphs that are used only when writing the Yiddish
languages.
Because these word-final forms and digraphs are separately encoded,
fonts do not implement GSUB substitutions to provide access to them.
### Mark classification ###
Because Hebrew text may include several types of mark (vowel niqqud,
cantillation marks, pronunciation marks) positioned on a base
character, sequences of adjacent marks may need to be reordered.
The Unicode standard defines a _canonical combining class_ for each
codepoint that is used whenever a sequence needs to be sorted into
canonical order.
Hebrew marks all belong to standard combining classes. Most, but not
all, cantillation marks are assigned to the generic below-base (220)
or above-base (230) combining classes. Niqqud are assigned to distinct
combining classes designed to enforce orthographically correct
ordering:
:::{table} Mark-classification table
| Codepoint | Combining class | Glyph |
|:----------|:----------------|:-----------------------------------|
| `U+0591` | 220 | ֑ Etnahta |
| `U+0592` | 230 | ֒ Segol |
| `U+05B0` | 10 | ְ Sheva |
| `U+05B2` | 12 | ֲ Hataf Patah |
| `U+05B9` | 19 | ֹ Holam |
| `U+05BF` | 23 | ֿ Rafe |
:::
The numeric values of these combining classes are used during Unicode
normalization.
### Character tables ###
The Hebrew block in Unicode contains the codepoints required to
represent text in all languages written using Hebrew.
The Alphabetic Presentation Forms block in Unicode includes 46
additional codepoints for Hebrew. Included are several precomposed
combinations of base characters and marks and the "Alef Lamed"
ligature, any of which may occur in `` text runs. Glyphs for
these presentation forms may be provided by fonts that do not
implement the corresponding mark-to-base and ligature features in
OpenType GSUB and GPOS tables.
The Alphabetic Presentation Forms block also includes a set of eight
"wide" variants of standard Hebrew characters (`U+FB21` through
`U+FB28`) and a variant form of "Ayin" (`U+FB20`), for backwards
compatibility with retired file-encoding standards. New usage of these
codepoints is not recommended and they are unlikely to occur in
contemporary documents.
Consequently, unless a software application is required to support
specific stores of documents that are known to have used these older
encodings, the shaping engine should not be expected to handle any
text runs incorporating these backwards-compatibility variant
codepoints.
Separate character tables are provided for the Hebrew block, the
Hebrew letters included in the Alphabetic Presentation Forms block,
and for other miscellaneous characters that are used in `` text
runs:
- [Hebrew character table](character-tables/character-tables-hebrew.md#hebrew-character-table)
- [Alphabetic Presentation Forms (Hebrew) character table](character-tables/character-tables-hebrew.md#alphabetic-presentation-forms-character-table)
- [Miscellaneous character table](character-tables/character-tables-hebrew.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category. For marks, the table lists the codepoint's mark combining
class. The codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Mark class | Glyph |
|:----------|:-----------------|:-----------|:-----------------------------|
|`U+05D0` | Letter | _0_ | א Alef |
| | | | | |
|`U+05C1` | Mark [Mn] | 24 | ׁ Point Shin Dot |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Other important characters that may be encountered when shaping runs
of Hebrew text include the dotted-circle placeholder (`U+25CC`), the
combining grapheme joiner (`U+034F`), the zero-width joiner (`U+200D`)
and zero-width non-joiner (`U+200C`), the left-to-right text marker
(`U+200E`) and right-to-left text marker (`U+200F`), and the no-break
space (`U+00A0`).
The dotted-circle placeholder is frequently used when displaying a
vowel or diacritical mark in isolation. Real-world text documents may
also use other characters, such as hyphens or dashes, in a similar
placeholder fashion; shaping engines should cope with this situation
gracefully.
The combining grapheme joiner (CGJ), zero-width joiner (ZWJ), and
zero-width non-joiner (ZWNJ) may be used to alter the
order in which adjacent marks are positioned during the
mark-reordering stage, in order to adhere to the needs of a
non-default language orthography.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by
the Unicode bidirectionality algorithm (BiDi) to indicate the points
in a text run at which the writing direction changes.
The no-break space may be used to display those codepoints that
are defined as non-spacing (such as niqqud or cantillation marks) in an
isolated context, as an alternative to displaying them superimposed on
the dotted-circle placeholder.
## The `` shaping model ##
Processing a run of `` text involves seven top-level stages:
1. Compound character composition and decomposition
2. Composing any Alphabetic Presentation forms
3. Applying the language-form substitution features from GSUB
4. Applying the typographic-form substitution features from GSUB
5. Applying the positioning features from GPOS
### Stage 1: Compound character composition and decomposition ###
In this stage, the `ccmp` feature from GPOS is applied and the
resulting sequence of codepoints should be checked for correct mark
order.
> Note: Shaping engines may have already applied Unicode normalization
> compose or decompose codepoints before beginning the shaping
> process. Due to the Alphabetic Presentation Forms composition in
> stage two, however, the `ccmp` feature and any necessary mark
> reordering must be performed here, as Alphabetic Presentation Forms
> are not handled by Unicode normalization.
#### Stage 1, step 1: ccmp
The `ccmp` feature allows a font to substitute
- mark-and-base sequences with a pre-composed glyph including both
the mark and the base (as is done in with a ligature substitution)
- individual compound glyphs with the equivalent sequence of
decomposed glyphs
If present, these composition and decomposition substitutions must be
performed before applying any other GSUB or GPOS lookups, because
those lookups may be written to match only the `ccmp`-substituted
glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-ccmp}
ccmp composition
:::
```{svg-color-toggle-button} hebrew-ccmp
```
#### Stage 1, step 2: Mark reordering
Sequences of adjacent marks must be reordered so that they appear in
canonical order before the mark-to-base and mark-to-mark positioning
features from GPOS can be correctly applied.
For `` text runs, normalizing the sequence of marks using the
Unicode _canonical combining class_ of each mark should be sufficient.
### Stage 2: Composing any Alphabetic Presentation forms ###
If the active font includes glyphs for precomposed mark-and-base
codepoints from the Alphabetic Presentation Forms block, these
precomposed glyphs should be preferred over sequences of individual
base glyphs and marks positioned with GPOS.
The codepoints in question are not included in the canonical Unicode
compositions, so the shaping engine should substitute them at this
stage, before proceeding with the shaping process.
The individual base and mark sequences that should compose to each
precomposed Hebrew mark-and-base codepoint in the Alphabetic
Presentation Forms block is listed in _Composition_ column of the
[Alphabetic Presentation Forms character
table](character-tables/character-tables-hebrew.md#alphabetic-presentation-forms-character-table).
For example:
:::{table} Example character table for Alphabetic Presentation forms
| Codepoint | Unicode category | Mark class | Composition | Glyph |
|:----------|:-----------------|:-----------|:----------------|:----------------------------------------|
| `U+FB1D` | Letter | _0_ |`U+05D9`,`U+05B4`| יִ Yod With Hiriq |
| | | | | |
| `U+FB2B` | Letter | _0_ |`U+05E9`,`U+05C2`| שׂ Shin With Sin Dot |
:::
Two of the precomposed glyphs, "Shin With Dagesh And Shin Dot"
(`U+FB2C`) and "Shin With Dagesh And Sin Dot" (`U+FB2D`), have
multiple possible composing sequences. All of the other precomposed
glyphs in the block have a single composing sequence.
> Note: the active font may implement these compositions in a `ccmp`
> lookup in GSUB, in which case this stage will involve no additional work.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-apf}
Alphabetic Presentation forms composition
:::
```{svg-color-toggle-button} hebrew-apf
```
### Stage 3: Applying the language-form substitution features from GSUB ###
The language-substitution phase applies mandatory substitution
features using the rules in the font's GSUB table. In preparation for
this stage, glyph sequences should be tagged for possible application
of GSUB features.
The order in which these substitutions must be performed is fixed for
all scripts implemented in the Hebrew shaping model:
locl
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
### Stage 4: Applying the typographic-form substitution features from GSUB ###
The typographic-substitution phase applies optional substitution
features using the rules in the font's GSUB table.
The order in which these substitutions must be performed is fixed for
all scripts implemented in the Hebrew shaping model:
liga
dlig
#### Stage 4, step 1: liga ####
The `liga` feature substitutes standard, optional ligatures that are on
by default. Substitutions made by `liga` may be disabled by
application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-liga}
Standard ligature substitution
:::
```{svg-color-toggle-button} hebrew-liga
```
#### Stage 4, step 2: dlig ####
The `dlig` feature substitutes additional optional ligatures that are
off by default. Substitutions made by `dlig` may be disabled by
application-level user interfaces.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-dlig}
Discretionary ligature substitution
:::
```{svg-color-toggle-button} hebrew-dlig
```
### Stage 5: Applying the positioning features from GPOS ###
The positioning stage adjusts the positions of mark and base
glyphs.
The order in which these features are applied is fixed for
the Hebrew shaping model:
kern
mark
#### Stage 5, step 1: `kern` ####
The `kern` adjusts glyph spacing between pairs of adjacent glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-kern}
Kerning application
:::
```{svg-color-toggle-button} hebrew-kern
```
#### Stage 5, step 2: `mark` ####
The `mark` feature positions marks with respect to base glyphs.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #hebrew-mark}
Mark positioning
:::
```{svg-color-toggle-button} hebrew-mark
```
================================================
FILE: opentype-shaping-indic-general.md
================================================
# Indic script shaping in OpenType #
This document outlines the general shaping procedure shared by all
Indic scripts, and defines the common pieces that script-specific
implementations share.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes](#shaping-classes)
- [Mark-placement subclasses](#mark-placement-subclasses)
- [Character tables](#character-tables)
- [The Indic2 shaping model](#the-indic2-shaping-model)
- [Sort ordering](#sort-ordering)
- [Script shaping characteristics](#script-shaping-characteristics)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The old Indic shaping model](#the-old-indic-shaping-model)
## General information ##
The Indic family of scripts includes writing systems
derived from the Brahmi script in ancient India. Although the scripts
vary considerably in appearance, their shared ancestry means that they
also share a number of important features and rules.
This makes it possible (though, of course, not mandatory) for a
shaping engine to implement a single shaping model that covers all of
the scripts.
The largest (by number of readers) scripts in the Indic family are:
- [Devanagari](opentype-shaping-devanagari.md)
- [Bengali](opentype-shaping-bengali.md)
- [Gujarati](opentype-shaping-gujarati.md)
- [Gurmukhi](opentype-shaping-gurmukhi.md)
- [Kannada](opentype-shaping-kannada.md)
- [Malayalam](opentype-shaping-malayalam.md)
- [Oriya](opentype-shaping-oriya.md)
- [Tamil](opentype-shaping-tamil.md)
- [Telugu](opentype-shaping-telugu.md)
- [Sinhala](opentype-shaping-sinhala.md)
Text runs in Indic scripts may also include characters from the Vedic
Extensions block in Unicode. This is a set of marks and punctuation
needed to accurately transcribe ancient documents in Sanskrit.
Text runs in Indic scripts also make use of joiner, non-joiner, and
placeholder characters from other Unicode blocks, in order to specify
certain alternate shaping options.
There are two sets of Indic script tags defined in OpenType. Several
from the older set (``, ``, ``, ``, ``,
``, ``, ``, and ``) were deprecated and
replaced in 2005.
The new set of replacement tags for these scripts (``, ``,
``, ``, ``, ``, ``, ``, and
``) was devised to overcome shortcomings found in the original model.
Therefore, new fonts should be engineered to work with the updated
shaping model. However, if a font is encountered that supports only
an older script tag, the shaping engine should deal with it gracefully.
The `` tag, unlike the other Indic script tags,
was not deprecated in 2005 and is still used for Sinhala text.
> Note: There are several other scripts derived from the Brahmi script
> that are often treated separately and not bundled into the "Indic"
> category by shaping engines. This is because these other scripts
> evolved to have significantly distinct rules for syllable
> construction, reordering, and shaping.
>
> The scripts include Buginese, Balinese, Javanese,
> [Khmer](opentype-shaping-khmer.md),
> [Lao](opentype-shaping-thai-lao.md),
> [Myanmar](opentype-shaping-myanmar.md),
> [Thai](opentype-shaping-thai-lao.md), and
> [Tibetan](opentype-shaping-tibetan.md).
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign.
The term "matra" is also used to refer to the headline above letters
in scripts like Devanagari, Bengali, and Gurmukhi. To avoid ambiguity,
the term **headline** is used in most Unicode and OpenType shaping
documents.
**Halant** and **Virama** are both standard terms for the below-base
"vowel-killer" sign. Unicode documents use the term "virama" most
frequently, while OpenType documents use the term "halant" most
frequently.
**Chandrabindu** (or simply **Bindu**) is the standard term for the
diacritical mark indicating that the preceding vowel should be
nasalized.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Indic text depends on the shaping engine correctly classifying
each glyph in the run. The classifications must distinguish between
consonants, vowels (independent and dependent), numerals, punctuation,
and various types of diacritical mark.
For most codepoints, the `General Category` property defined in the
Unicode standard is correct, but it is not sufficient to fully capture
the expected shaping behavior (such as glyph reordering). Therefore,
Indic glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark.
Less common mark classes include `TONE_MARKER`, `CANTILLATION`,
`GEMINATION_MARK`, `PURE_KILLER`, and `SYLLABLE_MODIFIER`. An
explanation of each class is included in the shaping documentation of
each script in which the class occurs.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (some of which are also categorized as `Mark [Mn]`).
There are several subclasses of consonants that arise on occasion, such as
`CONSONANT_DEAD`, `CONSONANT_MEDIAL`, `CONSONANT_PLACEHOLDER`,
`CONSONANT_WITH_STACKER`, and `CONSONANT_PRE_REPHA`.
These subclasses indicate that the letter should match simple
tests for consonants (as in the regular expressions used during
syllable identification), but the subclass may factor into
script-specific rules encountered in later shaping stages.
For example, `CONSONANT_DEAD` indicates that, unlike standard
consonants, the dead consonant carries no inherent vowel. This lack of
an inherent vowel means that the letter is likely not accompanied by a
`VIRAMA`; failure to recognize this distinction could trick a naive
parser into mis-identifying the letter as the base consonant of a
syllable during the base-consonant-identification step.
Not every script features an instance of each consonant subclass. A
full explanation of each subclass's behavior is explained in the
relevant stage of each script's shaping documentation.
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType
features that might affect how the respective glyphs are drawn, such
as `tnum`, which specifies the usage of tabular-width numerals, and
`sups`, which replaces the default glyphs with superscript variants.
### Mark-placement subclasses ###
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the base character to which it is attached.
The actual attachment position of these glyphs is determined by the
lookups found in the font's GPOS table. However, the reordering rules for
Indic scripts require that the shaping engine be able to identify
marks by their general position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Character tables ###
Character tables for all of the scripts, plus the Vedic Extensions and
important miscellaneous characters, are available here:
- [Devanagari](character-tables/character-tables-devanagari.md) (Including Devanagari Extended)
- [Bengali](character-tables/character-tables-bengali.md)
- [Gujarati](character-tables/character-tables-gujarati.md)
- [Gurmukhi](character-tables/character-tables-gurmukhi.md)
- [Kannada](character-tables/character-tables-kannada.md)
- [Malayalam](character-tables/character-tables-malayalam.md)
- [Oriya](character-tables/character-tables-oriya.md)
- [Tamil](character-tables/character-tables-tamil.md)
- [Telugu](character-tables/character-tables-telugu.md)
- [Sinhala](character-tables/character-tables-sinhala.md)
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Indic-script text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph" in those scripts that use an implicit sequence to request a
"Reph" form.
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
> Note: this particular usage of ZWJ may not apply to scripts that
> feature an explicit "Reph" codepoint or an explicit sequence for
> requesting "Reph". See the script-specific shaping documents for
> full details.
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
In addition to general punctuation, runs of text in several of the
supported scripts often use the danda (`U+0964`) and double danda
(`U+0965`) punctuation marks from the Devanagari block.
## The Indic2 shaping model ##
Processing a run of text in any of the modern Indic script tags
involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
The initial reordering and final reordering stages each involve a set
of script-specific rules that dictate how characters are reordered
from their sequence in the input stream into the correct ordering for
shaping rules to apply.
Specifically, certain consonants in each script are repositioned from
their logical position (that is, their position in the input
stream). The most common example is "Ra", which is frequently
converted into a combining mark-like form.
The resulting mark must be correctly positioned by attaching it to the
correct base character using the active font's `mark` lookup from
GPOS. Therefore, the mark form of the "Ra" must be moved so that it
is adjacent to the correct base character. Which character in a
syllable is the correct base character differs from script to script,
and may involve several context-sensitive tests.
Similarly, certain other consonants in each script also take on
distinct forms that require reordering so that `mark` positioning
and other lookups function correctly. Dependent vowels (matras) may
also need to be reordered so that they are adjacent to the correct
consonant. These functions, too, involve script-specific rule sets.
Because of the script-specific rules involved, it is mandatory that
the basic substitution features in stage three be applied in the order
specified.
The remaining substitution features in stage five and the positioning
features in stage six, however, do not have a mandatory order.
### Sort ordering ###
A single, canonical sequence of ordering positions exists that
captures all of the possible positions in an Indic syllable.
Not every position is used in every script and not every syllable will
contain a character in every position. Whenever characters in a
syllable are reordered during the shaping process,
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
Not every position is used in every script; the sequence merely
describes all of the possible positions at which a character in an
Indic syllable can exist. Using the same sequence for all scripts
could reduce an implementation's code size and complexity.
The basic positions (left to right) are "Reph" (`POS_RA_TO_BECOME_REPH`), dependent
vowels (matras) and consonants positioned before the base
consonant or syllable base (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base
consonant or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant or syllable base
(`POS_POSTBASE_CONSONANT`), syllable-final consonants (`POS_FINAL_CONSONANT`),
and syllable-modifying or Vedic signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the base consonant or syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
positions matter only in that they are unambiguous.
The description of the general shaping process that follows will note
when a character needs to be marked for reordering into some of these
positions. The specifics for each script provide additional details,
especially for ordering positions that are only used in that script.
### Script shaping characteristics ###
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The rules that determine the base consonant in a syllable.
- The final position of "Reph".
- How "Reph" is encoded or requested in a syllable.
- Whether the below-base forms feature is applied only to consonants
after the base consonant or syllable base, or to consonants before the base
consonant and those after the base consonant or syllable base.
- The ordering positions for dependent vowels
(matras). Specifically, the ordering for left-side, right-side,
above-base, and below-base matras follow different rules. The
rules employed vary between scripts, except for left-side matras,
where all Indic scripts follow the same rule.
In the lists that follow, the options for each characteristic are
mutually exclusive, and they are exhaustive for the set of Indic
scripts [listed](#general-information) at the beginning of this
document (Devanagari, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam,
Oriya, Tamil, Telugu, and Sinhala).
Implementers who wish to cover additional scripts using the same
method would first need to determine whether any additional options
are relevant for each characteristic.
#### Base consonant ####
Locating the base consonant of a syllable generally requires parsing
the syllable to catch and exclude certain special-treatment consonants
(such as "Ra"s that will form "Reph"s or consonants that take on
below-base forms). However, each script has a general base-consonant
position that determines the appropriate search method. The base
consonant may be, generally:
- The first consonant. This is designated `BASE_POS_FIRST`. This is
the simplest base-consonant rule. After eliminating any initial
"Repha"s from consideration, the first consonant is always the
base consonant, without exception.
- The last consonant, not counting any special forms. This is
designated `BASE_POS_LAST`. This is the most complicated
base-consonant rule, because the type and variety of special forms
vary considerably between scripts.
The `BASE_POS_LAST` search algorithm (described in each script's
shaping document) accounts for these special forms in every
script. The abundance of special forms in certain scripts may
routinely cause the search algorithm to identify a base consonant
that is not logically last in the syllable. This is expected
behavior.
This base-consonant position is used in Devanagari, Bengali,
Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu.
- The last consonant that is not preceded by a "ZWJ" (zero width
joiner) character.
This position is only used in Sinhala, and is designated
`BASE_POS_LAST_SINHALA`.
The scripts currently described in the "Indic" script group and their
corresponding base-consonant rules are summarized in the following
table:
:::{table} Base-consonant rules by script
| Script | Base-consonant rule |
|:-----------|:-----------------------|
| Devanagari | `BASE_POS_LAST` |
| Bengali | `BASE_POS_LAST` |
| Gujarati | `BASE_POS_LAST` |
| Gurmukhi | `BASE_POS_LAST` |
| Kannada | `BASE_POS_LAST` |
| Malayalam | `BASE_POS_LAST` |
| Oriya | `BASE_POS_LAST` |
| Tamil | `BASE_POS_LAST` |
| Telugu | `BASE_POS_LAST` |
| Sinhala | `BASE_POS_LAST_SINHALA`|
:::
> Note: None of the specific scripts currently included in the "Indic"
> script group as it is enumerated in this document make use of the
> `BASE_POS_FIRST` base-consonant rule. However, the `BASE_POS_FIRST`
> rule is employed by several Brahmi-derived scripts also used in the
> region, including both [Myanmar](opentype-shaping-myanmar.md) and
> [Khmer](opentype-shaping-khmer.md).
>
> Because these scripts share many other characteristics and
> conventions with the Indic group described by this document,
> `BASE_POS_FIRST` is included here for comparison.
> Note: The `BASE_POS_LAST` search algorithm is used for Kannada and
> Telugu, although the unique properties of the Kannada and Telugu
> orthographies usually result in the search terminating at the first
> non-"Reph" consonant in a syllable. Namely, all consonants in
> Kannada and Telugu have a post-base form.
>
> This is the expected behavior for Kannada and Telugu, and still
> differs from the `BASE_POS_FIRST` rule as used in the Brahmi-derived
> scripts mentioned above. See those individual script pages for
> further detail.
#### Reph position ####
"Reph" may be positioned:
- at the beginning of the syllable, in the ordering position
`POS_RA_TO_BECOME_REPH`.
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.
- immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.
- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
- immediately before the last post-base consonant, in the ordering
position `POS_BEFORE_POST`.
- immediately after the last post-base consonant, in the ordering
position `POS_AFTER_POST`.
The scripts currently described in the "Indic" script group and their
corresponding Reph-position rules are summarized in the following
table:
:::{table} Reph-position rules by script
| Script | Reph-position rule |
|:-----------|:---------------------------|
| Devanagari | `REPH_POS_BEFORE_POST` |
| Bengali | `REPH_POS_AFTER_SUBJOINED` |
| Gujarati | `REPH_POS_BEFORE_POST` |
| Gurmukhi | `REPH_POS_BEFORE_SUBJOINED`|
| Kannada | `REPH_POS_AFTER_POST` |
| Malayalam | `REPH_POS_AFTER_MAIN` |
| Oriya | `REPH_POS_AFTER_MAIN` |
| Tamil | `REPH_POS_AFTER_POST` |
| Telugu | `REPH_POS_AFTER_POST` |
| Sinhala | `REPH_POS_AFTER_POST` |
:::
#### Reph encoding ####
"Reph" may be:
- requested explicitly, using the sequence "Ra,Halant,ZWJ". This is
designated `REPH_MODE_EXPLICIT`.
- Formed implicitly by the sequence "Ra,Halant" when used in certain positions
in a syllable. This is designated `REPH_MODE_IMPLICIT`. Because a
"Ra,Halant" does _not_ form a "Reph" in _every_ position in a
syllable, script-specific tests are required.
- encoded as a separate codepoint. This codepoint is generally
called "Repha", which distinguishes it from the "Reph"s formed by
other sequences. A "Repha" may need reordering based on script
specific rules, in which case `REPH_MODE_LOGICAL_REPHA` is
used. Alternatively, the script may not reorder "Repha"s at all,
in which case `REPH_MODE_VISUAL_REPHA` is used.
The scripts currently described in the "Indic" script group and their
corresponding Reph-encoding rules are summarized in the following
table:
:::{table} Reph-encoding rules by script
| Script | Reph-encoding rule |
|:-----------|:---------------------------|
| Devanagari | `REPH_MODE_IMPLICIT` |
| Bengali | `REPH_MODE_IMPLICIT` |
| Gujarati | `REPH_MODE_IMPLICIT` |
| Gurmukhi | `REPH_MODE_IMPLICIT` |
| Kannada | `REPH_MODE_IMPLICIT` |
| Malayalam | `REPH_MODE_LOGICAL_REPHA` |
| Oriya | `REPH_MODE_IMPLICIT` |
| Tamil | `REPH_MODE_IMPLICIT` |
| Telugu | `REPH_MODE_EXPLICIT` |
| Sinhala | `REPH_MODE_EXPLICIT` |
:::
> Note: None of the specific scripts currently included in the "Indic"
> group as it is enumerated in this document make use of the
> `REPH_MODE_VISUAL_REPHA` encoding. However, `REPH_MODE_VISUAL_REPHA`
> is used in the [Khmer](opentype-shaping-khmer.md) script.
>
> Because Khmer shares many other characteristics and
> conventions with the Indic group described by this document,
> `REPH_MODE_VISUAL_REPHA` is included here for comparison.
#### Below-base forms ####
Below-base consonant forms (the `blwf` feature) may be applied:
- Only to consonants after the base consonant or syllable base. This is designated
`BLWF_MODE_POST_ONLY`.
- To consonants occurring before or after the base consonant or syllable base. This is
designated `BLWF_MODE_PRE_AND_POST`.
The scripts currently described in the "Indic" script group and their
corresponding below-base–forms rules are summarized in the following
table:
:::{table} Below-base–forms rules by script
| Script | Below-base–forms rule |
|:-----------|:-------------------------|
| Devanagari | `BLWF_MODE_PRE_AND_POST` |
| Bengali | `BLWF_MODE_PRE_AND_POST` |
| Gujarati | `BLWF_MODE_PRE_AND_POST` |
| Gurmukhi | `BLWF_MODE_PRE_AND_POST` |
| Kannada | `BLWF_MODE_POST_ONLY` |
| Malayalam | `BLWF_MODE_PRE_AND_POST` |
| Oriya | `BLWF_MODE_PRE_AND_POST` |
| Tamil | `BLWF_MODE_PRE_AND_POST` |
| Telugu | `BLWF_MODE_POST_ONLY` |
| Sinhala | `BLWF_MODE_PRE_AND_POST` |
:::
#### Left-side matras ####
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
#### Right-side matras ####
Right-side matras may be positioned:
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.
- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
- immediately after the last post-base consonant, in the ordering
position `POS_AFTER_POST`.
The scripts currently described in the "Indic" script group and their
corresponding right-side–matra positions are summarized in the following
table:
:::{table} Right-side–matra positions by script
| Script | Right-side–matra position |
|:-----------|:--------------------------|
| Devanagari | `POS_AFTER_SUBJOINED` |
| Bengali | `POS_AFTER_POST` |
| Gujarati | `POS_AFTER_POST` |
| Gurmukhi | `POS_AFTER_POST` |
| Kannada | _varies_ |
| Malayalam | `POS_AFTER_POST` |
| Oriya | `POS_AFTER_POST` |
| Tamil | `POS_AFTER_POST` |
| Telugu | _varies_ |
| Sinhala | `POS_AFTER_SUBJOINED` |
:::
> Note: In most scripts, all right-side matras are positioned in the
> same sort-order position. The Kannada and Telugu scripts, however,
> feature more complex positioning rules for right-side matras, in
> which different right-side matras must be sorted into different
> positions. See the script-specific shaping documents for full
> details.
#### Above-base matras ####
Above-base matras may be positioned:
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.
- immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.
- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
- immediately after the last post-base consonant, in the ordering
position `POS_AFTER_POST`.
The scripts currently described in the "Indic" script group and their
corresponding above-base–matra positions are summarized in the following
table:
:::{table} Above-base–matra positions by script
| Script | Above-base–matra position |
|:-----------|:--------------------------|
| Devanagari | `POS_AFTER_SUBJOINED` |
| Bengali | _null_ |
| Gujarati | `POS_AFTER_SUBJOINED` |
| Gurmukhi | `POS_AFTER_POST` |
| Kannada | `POS_BEFORE_SUBJOINED` |
| Malayalam | _null_ |
| Oriya | `POS_AFTER_MAIN` |
| Tamil | `POS_AFTER_SUBJOINED` |
| Telugu | `POS_BEFORE_SUBJOINED` |
| Sinhala | `POS_AFTER_SUBJOINED` |
:::
#### Below-base matras ####
Below-base matras may be positioned:
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.
- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
- immediately after the last post-base consonant, in the ordering
position `POS_AFTER_POST`.
The scripts currently described in the "Indic" script group and their
corresponding below-base–matra positions are summarized in the following
table:
:::{table} Below-base–matra positions by script
| Script | Below-base–matra position |
|:-----------|:--------------------------|
| Devanagari | `POS_AFTER_SUBJOINED` |
| Bengali | `POS_AFTER_SUBJOINED` |
| Gujarati | `POS_AFTER_POST` |
| Gurmukhi | `POS_AFTER_POST` |
| Kannada | `POS_BEFORE_SUBJOINED` |
| Malayalam | `POS_AFTER_POST` |
| Oriya | `POS_AFTER_SUBJOINED` |
| Tamil | `POS_AFTER_POST` |
| Telugu | `POS_BEFORE_SUBJOINED` |
| Sinhala | `POS_AFTER_SUBJOINED` |
:::
### Stage 1: Identifying syllables and other sequences ###
A syllable in an Indic script consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
The Nukta, Halant/Virama, and Anudatta marks can affect syllable
identification. All other signs are regarded as syllable modifier
signs, including those from the Vedic Extensions block.
Generally speaking, each syllable contains exactly one vowel
sound. Valid syllables may begin with either a consonant or an
independent vowel.
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
Valid consonant-based syllables may include one or more additional
consonants that precede the base consonant. Each of these
other, pre-base consonants will be followed by the "Halant" mark, which
indicates that they carry no vowel. They affect pronunciation by
combining with the base consonant (e.g., "_str_", "_pl_") but they
do not add a vowel sound.
Some Indic scripts also include special consonants that can occur after the
base consonant or syllable base. These post-base consonants and final consonants will
also be separated from the base consonant or syllable base by a "Halant" mark; the
algorithm for correctly identifying the base consonant includes a test
to recognize these sequences and not mis-identify the base consonant.
In Indic scripts, the consonant "Ra" receives special treatment; in
many circumstances it is replaced by one of two combining mark-like forms.
- A "Ra,Halant" or "Ra,Halant,ZWJ" sequence at the beginning of a
syllable may be replaced with an above-base mark called "Reph"
(although script-specifics rules may negate this replacement if
the "Ra" is the only consonant in the syllable).
- "Halant,Ra" sequences that occur elsewhere in the syllable may
take on a below-base form (called "Rakaar" in Devanagari and most
other scripts, and called "Raphala" in Bengali).
In addition, some scripts reorder post-base "Ra"s to a pre-base
position. These re-ordering "Ra"s may take on a different form, but
they are letter-like rather than mark-like forms.
"Reph", "Rakaar", "Raphala", and reordering "Ra" characters must be
reordered after the syllable-identification stage is complete.
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in Indic scripts, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their shaping class, using regular expressions.
The following general-purpose regular expressions can be
used to match Indic syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER`| `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
> Note: "standalone" syllables can be used to display examples of
> letters, marks, and other characters without requiring full
> syllables or words.
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
This may mean moving dependent-vowel (matra) glyphs, "Ra,Halant"
sequences, and other consonants that take special
treatment in some circumstances.
These reordering moves are mandatory. The final-reordering stage
may make additional moves, depending on the text and on the features
implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. Not every position will be utilized in every script.
Additional information about the ordering positions is available in
the [sort ordering](#sort-ordering) section of this document.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
The algorithm used to find the base consonant varies according to the
base-consonant shaping characteristic of the script.
For `BASE_POS_FIRST` scripts, the first consonant of the syllable is
the base consonant.
> Note: None of the specific scripts currently included in the "Indic"
> group as it is enumerated in this document make use of the
> `BASE_POS_FIRST` base-consonant rule. However, the `BASE_POS_FIRST`
> rule is employed by several Brahmi-derived scripts also used in the
> region, including both [Myanmar](opentype-shaping-myanmar.md) and
> [Khmer](opentype-shaping-khmer.md).
>
> Because these scripts share many other characteristics and
> conventions with the Indic group described by this document,
> `BASE_POS_FIRST` is included here for comparison.
For `BASE_POS_LAST` scripts, the base consonant is the last consonant
in the syllable, excluding all consonants that will take on special
post-base, final, or below-base forms, and excluding all pre-base
reordering "Ra"s. For a detailed explanation of the search algorithm
employed, see the page for each specific script.
For Sinhala, which uses `BASE_POS_LAST_SINHALA`, the base consonant is
the last consonant that is not preceded by a zero-width joiner
("ZWJ").
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reodering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
#### Stage 2, step 2: Matra decomposition ####
Second, any two-part or three-part dependent vowels (matras) must be decomposed
into their component parts.
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
#### Stage 2, step 3: Tag decomposed matras ####
Third, all dependent-vowel (matra) signs must be tagged with
their final position.
Single-part matras can be tagged with the appropriate sort-ordering
position based on the ordering position of the script's specific
script-shaping characteristics.
In most cases, all matras of the same Mark-positioning subclass (such
as `LEFT_POSITION`) in a particular script are tagged with the same
final position (such as `POS_PREBASE_MATRA`).
Some scripts, however, include matras that must be tagged according to
more involved rule sets. In the set of Indic scripts described here,
this includes [Kannada](opentype-shaping-kannada.md) and
[Telugu](opentype-shaping-telugu.md). See the individual
script-shaping document of each script to find a complete description
of the applicable matra-tagging rules.
> Note: The shaping engine may, as an alternative, choose to perform
> this tagging earlier, such as during an initial Unicode-normalization
> stage.
>
> Matras that resulted from the preceding decomposition step, however,
> may not have been tagged when they were decomposed. If not, they must
> be tagged for reordering before proceeding to the next step.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" (in `REPH_MODE_IMPLICIT` scripts) or
"Ra,Halant,ZWJ" (in `REPH_MODE_EXPLICIT` scripts) sequences that will
become "Reph"s must be tagged with `POS_RA_TO_BECOME_REPH`.
#### Stage 2, step 7: Post-base consonants ####
Seventh, any non-base consonants that occur after a dependent vowel
(matra) sign must be tagged with `POS_POSTBASE_CONSONANT`. Such
consonants will either be followed by a "Halant" glyph or will be in
the `CONSONANT_DEAD` shaping class.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `blwf` should match "Halant,Ra", "Halant,Ha", and "Halant,Va" in
post-base positions and "Ra,Halant", "Ha,Halant", and
"Va,Halant" in non-initial pre-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `pstf` should match initial "Halant,Ya" in post-base position
- `vatu` should match "_Consonant_,Halant,Ra",
"_Consonant_,Halant,Ha", and "_Consonant_,Halant,Va"
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf
pref
blwf
abvf
half
pstf
vatu
cjct
cfar
Not every feature is used in every script. See the individual script
pages for further script-specific information.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"_Consonant_,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unneccessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. The correct final position depends on the script's
Reph-position shaping characteristic, and is conditional upon the
presence or absence of certain characters (such as post-base
consonants or "matra,Halant" sequences) in the syllable.
The full algorithm for determining the final Reph position has seven steps.
(a) If the script uses Reph-position rule `REPH_POS_AFTER_POST`, jump
immediately to step (e). Otherwise, proceed to step (b).
(b) Find the first explicit "Halant" between the syllable base
consonant and the first post-Reph consonant. If there is a ZWJ or ZWNJ
following this "Halant", move the "Reph" to a position immediately
after the ZWJ or ZWNJ, then proceed to step (mH). Otherwise, move the
"Reph" to a position immediately after the "Halant", then proceed to
step (mH). If no such explicit "Halant" is found, proceed to step
(c).
(c) If the script uses Reph-position rule `REPH_POS_AFTER_MAIN`, find
the first consonant not ligated with the syllable base, and that is
not a potential pre-base reordering "Ra". If such a consonant is
found, move the "Reph" to a position immediately after the
consonant, then proceed to step (mH). If no such consonant is found,
proceed to step (d). If the script uses a different Reph-position
rule, proceed to step (d).
(d) If the script uses Reph-position rule `REPH_POS_BEFORE_POST`, find
the first post-base consonant not ligated with the syllable base. If
such a consonant is found, move the "Reph" to a position immediately
before the consonant, then proceed to step (mH). If no such consonant
is found, proceed to step (e). If the script uses a different
Reph-position rule, proceed to step (e).
(e) Move the "Reph" to a position immediately before the first
post-base matra, syllable modifier sign or Vedic sign that has a
reordering class after the intended Reph position in the syllable sort
order (as listed in [stage 2](#stage-2-initial-reordering)). This will be
the final "Reph" position. , then proceed to step (mH). If no such
matra or sign is found, proceed to step (f).
(f) Move the "Reph" to the end of the syllable.
(mH) Finally, if the "Reph" position arrived at in the preceding steps
is immediately after a "matra,Halant" sequence, move the "Reph" so
that it is before the "Halant".
Taking the Reph-position–rule conditionals in the above algorithm into
account, the position-finding steps that may be executed in each
script are summarized in the following table:
:::{table} Summary of final–reph-positioning rules by script
| Script | Reph-position rule | a | b | c | d | e | f | mH |
|:-----------|:--------------------------|:--|:--|:--|:--|:--|:--|:---|
| Devanagari |`REPH_POS_BEFORE_POST` | | • | | • | • | • | • |
| Bengali |`REPH_POS_AFTER_SUBJOINED` | | • | | | | • | • |
| Gujarati |`REPH_POS_BEFORE_POST` | | • | | • | • | • | • |
| Gurmukhi |`REPH_POS_BEFORE_SUBJOINED`| | • | | | | • | • |
| Kannada |`REPH_POS_AFTER_POST` | | | | | • | • | • |
| Malayalam |`REPH_POS_AFTER_MAIN` | | • | • | | • | • | • |
| Oriya |`REPH_POS_AFTER_MAIN` | | • | • | | • | • | • |
| Tamil |`REPH_POS_AFTER_POST` | | | | | • | • | • |
| Telugu |`REPH_POS_AFTER_POST` | | | | | • | • | • |
| Sinhala |`REPH_POS_AFTER_MAIN` | | • | • | | • | • | • |
:::
#### Stage 4, step 4: Pre-base reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
> Note: The `init` feature for word-initial dependent vowels (matras)
> is defined only for Bengali and should not be expected in fonts for
> any other scripts. Therefore, this step will involve no work when
> processing non-`` text.
### Stage 5: Applying all remaining substitution features from GSUB ###
In this stage, the remaining substitution features from the GSUB table
are applied. In preparation for this stage, glyph sequences should be
flagged for possible application of GSUB features in stage 2,
step 10.
The order in which these features are applied is not canonical; they
should be applied in the order in which they appear in the GSUB table
in the font.
init
pres
abvs
blws
psts
haln
### Stage 6: Applying remaining positioning features from GPOS ###
In this stage, mark positioning, kerning, and other GPOS features are
applied.
As with the preceding stage, the order in which these features are
applied is not canonical; they should be applied in the order in which
they appear in the GPOS table in the font.
dist
abvm
blwm
## The old Indic shaping model ##
The older Indic script tags (``, ``, ``, ``, ``,
``, ``, ``, and ``) have been deprecated. However,
shaping engines may still encounter fonts that were built to work with
these tags and some users may still have documents that were written to
take advantage of the original shaping rules.
### Distinctions from the Indic2 model ###
The most significant distinction between the shaping models is that the
sequence of "Halant" and consonant glyphs used to trigger shaping
features) was altered when migrating from the old to the new shaping model.
Specifically, shaping engines were expected to reorder post-base
"Halant,_Consonant_" sequences to "_Consonant_,Halant".
As a result, a font's GSUB substitutions would be written to match
"_Consonant_,Halant" sequences in all pre-base and post-base positions.
The old-model Indic syllable
Pre-baseC Halant BaseC Halant Post-baseC
would be reordered to
Pre-baseC Halant BaseC Post-baseC Halant
before features are applied.
In Indic2 text, as described above in this document, there is no
such reordering. The correct sequence to match for GSUB substitutions is
"_Consonant_,Halant" for pre-base consonants, but "Halant,_Consonant_"
for post-base consonants.
The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, all scripts
were treated as if they followed the `BLWF_MODE_POST_ONLY`
characteristic. In other words, below-base form substitutions were
only applied to consonants after the base consonant or syllable base.
In addition, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For ``, ``, ``, ``, ``,
``, and `` text, the left-side matra was always positioned
at the beginning of the syllable. For `` and `` text, the
left-side matra was positioned immediately before the base consonant or syllable base.
### Advice for handling fonts with old Indic features only ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences in order to apply GSUB substitutions when it is known that
the font in use supports only the old shaping model.
### Advice for handling text runs composed in the old Indic format ###
Shaping engines may choose to match post-base "_Consonant_,Halant"
sequences for GSUB substitutions or to reorder them to
"Halant,_Consonant_" when processing text runs that are tagged with
one of the old Indic script tags and it is known that the font in use supports
only the Indic2 shaping model.
Shaping engines may also choose to apply `blwf` substitutions to
below-base consonants occuring before the base consonant when it is
known that the font in use supports an applicable substitution lookup.
Shaping engines may also choose to position left-side matras according
to the old-model Indic ordering scheme; however, doing so might interfere
with matching GSUB or GPOS features.
================================================
FILE: opentype-shaping-kannada.md
================================================
```{include} /_global.md
```
# Kannada shaping in OpenType #
This document details the shaping procedure needed to display text
runs in the Kannada script.
**Contents**
- [General information](#general-information)
- [Terminology](#terminology)
- [Glyph classification](#glyph-classification)
- [Shaping classes and subclasses](#shaping-classes-and-subclasses)
- [Kannada character tables](#kannada-character-tables)
- [The `` shaping model](#the-knd2-shaping-model)
- [Stage 1: Identifying syllables and other sequences](#stage-1-identifying-syllables-and-other-sequences)
- [Stage 2: Initial reordering](#stage-2-initial-reordering)
- [Stage 3: Applying the basic substitution features from GSUB](#stage-3-applying-the-basic-substitution-features-from-gsub)
- [Stage 4: Final reordering](#stage-4-final-reordering)
- [Stage 5: Applying all remaining substitution features from GSUB](#stage-5-applying-all-remaining-substitution-features-from-gsub)
- [Stage 6: Applying remaining positioning features from GPOS](#stage-6-applying-remaining-positioning-features-from-gpos)
- [The `` shaping model](#the-knda-shaping-model)
- [Distinctions from ``](#distinctions-from-knd2)
- [Advice for handling fonts with `` features only](#advice-for-handling-fonts-with-knda-features-only)
- [Advice for handling text runs composed in `` format](#advice-for-handling-text-runs-composed-in-knda-format)
## General information ##
The Kannada script belongs to the Indic family, and follows
the same general patterns as the other Indic scripts. More
specifically, it belongs to the South Indic subgroup, in which
sequences of adjacent consonants are often represented as below-base forms.
The Kannada script is used to write multiple languages, most commonly
Kannada, plus several minority languages. In addition, Sanskrit may be written
in Kannada, so Kannada script runs may include glyphs from the Vedic
Extensions block of Unicode.
There are two extant Kannada script tags defined in OpenType, ``
and ``. The older script tag, ``, was deprecated in 2005.
Therefore, new fonts should be engineered to work with the ``
shaping model. However, if a font is encountered that supports only
``, the shaping engine should deal with it gracefully.
## Terminology ##
OpenType shaping uses a standard set of terms for Indic scripts. The
terms used colloquially in any particular language may vary, however,
potentially causing confusion.
**Matra** is the standard term for a dependent vowel sign. In the Kannada
language, dependent-vowel signs may also be referred to as _swara_ forms.
The term "matra" is also used to refer to the headline in other Indic
scripts, and may be used to describe the distinctive cap stroke above most
Kannada letters by comparison. To avoid ambiguity, the term **headline** is
used in most Unicode and OpenType shaping documents.
**Halant** and **Virama** are both standard terms for the above-base "vowel-killer"
sign. Unicode documents use the term "virama" most frequently, while
OpenType documents use the term "halant" most frequently. In the Kannada
language, this sign is known as the _hrasva_.
**Chandrabindu** (or simply **Bindu**) is the standard term for the diacritical mark
indicating that the preceding vowel should be nasalized. In the Kannada
language, this mark is known as the _candrabindu_.
The term **base consonant** is also critical to Indic shaping. The
base consonant of a syllable is the consonant that carries the
syllable's vowel sound, either the inherent vowel (for an unmarked
base consonant) or a dependent vowel (with the addition of a matra).
A syllable's base consonant is generally rendered in its full form
(although it may form ligatures), while other consonants in the
syllable frequently take on secondary forms. Different GSUB
substitutions may apply to a script's **pre-base** and **post-base**
consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.
Syllables may also begin with an **independent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.
Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
## Glyph classification ##
Shaping Kannada text depends on the shaping engine correctly
classifying each glyph in the run. As with most other scripts, the
classifications must distinguish between consonants, vowels
(independent and dependent), numerals, punctuation, and various types
of diacritical mark.
For most codepoints, the `General Category` property defined in the Unicode
standard is correct, but it is not sufficient to fully capture the
expected shaping behavior (such as glyph reordering). Therefore,
Kannada glyphs must additionally be classified by how they are treated
when shaping a run of text.
### Shaping classes and subclasses ###
The shaping classes listed in the tables that follow are defined so
that they capture the positioning rules used by Indic scripts.
For most codepoints, the _Shaping class_ is synonymous with the `Indic
Syllabic Category` defined in Unicode. However, there are some
distinctions, where the defined category does not fully capture the
behavior of the character in the shaping process.
Several of the diacritic and syllable-modifying marks behave according
to their own rules and, thus, have a special class. These include
`BINDU`, `VISARGA`, `AVAGRAHA`, `NUKTA`, and `VIRAMA`. Some
less-common marks behave according to rules that are similar to these
common marks, and are therefore classified with the corresponding
common mark. The Vedic Extensions also include a `CANTILLATION`
class for tone marks.
Letters generally fall into the classes `CONSONANT`,
`VOWEL_INDEPENDENT`, and `VOWEL_DEPENDENT`. These classes help the
shaping engine parse and identify key positions in a syllable. For
example, Unicode categorizes dependent vowels as `Mark [Mn]`, but the
shaping engine must be able to distinguish between dependent vowels
and diacritical marks (some of which are also categorized as `Mark [Mn]`).
Kannada uses one subclass of consonant, `CONSONANT_WITH_STACKER`. This
subclass supports two consonants, "Jihvamuliya" (`U+0CF1`) and
"Upadhmaniya" (`U+0CF2`), that are used only for Sanskrit text
runs. These consonants may form stacked ligatures with subsequent
consonants without an intervening "Halant". Such ligature formation,
if desired, must be implemented in the font.
The letters classified as `CONSONANT_WITH_STACKER` should be treated
as consonants when [identifying
syllables](#stage-1-identifying-syllables-and-other-sequences). No
additional behavior is required.
Other characters, such as symbols and miscellaneous letters (for
example, letter-like symbols that only occur as standalone entities
and do not occur within syllables), need no special attention from the
shaping engine, so they are not assigned a shaping class.
Numbers are classified as `NUMBER`, even though they evoke no special
behavior from the Indic shaping rules, because there are OpenType features that
might affect how the respective glyphs are drawn, such as `tnum`,
which specifies the usage of tabular-width numerals, and `sups`, which
replaces the default glyphs with superscript variants.
Marks and dependent vowels are further labeled with a mark-placement
subclass, which indicates where the glyph will be placed with respect
to the base character to which it is attached. The actual position of
the glyphs is determined by the lookups found in the font's GPOS
table, however, the shaping rules for Indic scripts require that the
shaping engine be able to identify marks by their general
position.
For example, left-side dependent vowels (matras), classified
with `LEFT_POSITION`, must frequently be reordered, with the final
position determined by whether or not other letters in the syllable
have formed ligatures or combined into conjunct forms. Therefore, the
`LEFT_POSITION` subclass of the character must be tracked throughout
the shaping process.
There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the syllable base to which it is attached:
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.
These positions may also be referred to elsewhere in shaping documents as:
- _Pre-base_ matras
- _Post-base_ matras
- _Above-base_ matras
- _Below-base_ matras
respectively. The `LEFT`, `RIGHT`, `TOP`, and `BOTTOM` designations
corresponds to Unicode's preferred terminology. The _Pre_, _Post_,
_Above_, and _Below_ terminology is used in the official descriptions
of OpenType GSUB and GPOS features. Shaping engines may, internally,
use whichever terminology is preferred.
In addition, dependent-vowel codepoints that are composed of multiple
components will be designated in character tables as having a compound
_mark-placement subclass_, such as `TOP_AND_RIGHT` or `LEFT_AND_RIGHT`.
However, these multi-part matras are decomposed into separate matra
components during the shaping process. After the decomposition, each
matra component will belong to exactly one of the four basic
_mark-placement subclasses_.
For most mark and dependent-vowel codepoints, the _mark-placement
subclass_ is synonymous with the `Indic Positional Category` defined
in Unicode. However, there are some distinctions, where the defined
category does not fully capture the behavior of the character in the
shaping process.
### Kannada character tables ###
Separate character tables are provided for the Kannada and Vedic
Extensions blocks as well as for other miscellaneous characters that
are used in `` text runs:
- [Kannada character table](character-tables/character-tables-kannada.md#kannada-character-table)
- [Vedic Extensions character table](character-tables/character-tables-kannada.md#vedic-extensions-character-table)
- [Miscellaneous character table](character-tables/character-tables-kannada.md#miscellaneous-character-table)
The tables list each codepoint along with its Unicode general
category, its shaping class, and its mark-placement subclass. The
codepoint's Unicode name and an example glyph are also provided.
For example:
:::{table} Example character table
| Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
|:----------|:-----------------|:------------------|:---------------------------|:-----------------------------|
|`U+0C81` | Mark [Mn] | BINDU | TOP_POSITION | ಁ Candrabindu |
| | | | |
|`U+0C95` | Letter | CONSONANT | _null_ | ಕ Ka |
:::
Codepoints with no assigned meaning are
designated as _unassigned_ in the _Unicode category_ column.
Assigned codepoints with a _null_ in the _Shaping class_
column evoke no special behavior from the shaping engine.
The _Mark-placement subclass_ column indicates mark-placement
positioning for codepoints in the _Mark_ category. Assigned, non-mark
codepoints have a _null_ in this column and evoke no special
mark-placement behavior. Marks tagged with [Mn] in the _Unicode
category_ column are categorized as non-spacing; marks tagged with
[Mc] are categorized as spacing-combining.
Some codepoints in the tables use a _Shaping class_ that
differs from the codepoint's Unicode _General Category_. The _Shaping
class_ takes precedence during OpenType shaping, as it captures more
specific, script-aware behavior.
#### Special-function codepoints ####
Other important characters that may be encountered when shaping runs
of Kannada text include the dotted-circle placeholder (`U+25CC`), the
zero-width joiner (`U+200D`) and zero-width non-joiner (`U+200C`), and
the no-break space (`U+00A0`).
Each of these is of particular importance to shaping engines, because
these codepoints interact with the shaping engine, the text run, and
the active font, either to mediate non-default shaping behavior or to
relay information about the current shaping process.
The dotted-circle placeholder is frequently used when displaying a
dependent vowel (matra) or a combining mark in isolation. Real-world
text syllables may also use other characters, such as hyphens or dashes,
in a similar placeholder fashion; shaping engines should cope with
this situation gracefully.
Dotted-circle placeholder characters (like any Unicode codepoint) can
appear anywhere in text input sequences and should be rendered
normally. GPOS positioning lookups should attach mark glyphs to dotted
circles as they would to other non-mark characters. As visible glyphs,
dotted circles can also be involved in GSUB substitutions.
In addition to the default input-text handling process, shaping
engines may also insert dotted-circle placeholders into the text
sequence. Dotted-circle insertions are required when a non-spacing
mark or dependent sign is formed with no base character present.
This requirement covers:
- Dependent signs that are assigned their own individual Unicode
codepoints (such as most dependent-vowel marks or matras)
- Dependent signs that are formed only by specific sequences of
other codepoints (such as "Reph")
The zero-width joiner (ZWJ) is primarily used to prevent the formation
of a conjunct from a "_Consonant_,Halant,_Consonant_" sequence.
- The sequence "_Consonant_,Halant,ZWJ,_Consonant_" blocks the
formation of a conjunct between the two consonants.
Note, however, that the "_Consonant_,Halant" subsequence in the above
example may still trigger a half-forms feature. To prevent the
application of the half-forms feature in addition to preventing the
conjunct, the zero-width non-joiner (ZWNJ) must be used instead.
- The sequence "_Consonant_,Halant,ZWNJ,_Consonant_" should produce
the first consonant in its standard form, followed by an explicit
"Halant".
A secondary usage of the zero-width joiner is to prevent the formation of
"Reph".
- An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph",
even where an initial "Ra,Halant" sequence without the zero-width
joiner would otherwise produce a "Reph".
The ZWJ and ZWNJ characters are, by definition, non-printing control
characters and have the _Default_Ignorable_ property in the Unicode
Character Database. In standard text-display scenarios, their function
is to signal a request from the user to the shaping engine for some
particular non-default behavior. As such, they are not rendered
visually.
> Note: Naturally, there are special circumstances where a user or
> document might need to request that a ZWJ or ZWNJ be rendered
> visually, such as when illustrating the OpenType shaping process, or
> displaying Unicode tables.
Because the ZWJ and ZWNJ are non-printing control characters, they can
be ignored by any portion of a software text-handling stack not
involved in the shaping operations that the ZWJ and ZWNJ are designed
to interface with. For example, spell-checking or collation functions
will typically ignore ZWJ and ZWNJ.
Similarly, the ZWJ and ZWNJ should be ignored by the shaping engine
when matching sequences of codepoints against the backtrack and
lookahead sequences of a font's GSUB or GPOS lookups.
For example:
- A lookup that substitutes an alternate version of a
dependent-vowel (matra) glyph when it is preceded by "Ka,Halant,Tta"
should still be applied if the dependent-vowel codepoint is preceded
by "Ka,Halant,ZWJ,Tta" in the text run.
The no-break space (NBSP) is primarily used to display those
codepoints that are defined as non-spacing (marks, dependent vowels
(matras), below-base consonant forms, and post-base consonant forms)
in an isolated context, as an alternative to displaying them
superimposed on the dotted-circle placeholder. These sequences will
match "NBSP,ZWJ,Halant,_Consonant_", "NBSP,_mark_", or "NBSP,_matra_".
In addition to general punctuation, runs of Kannada text often use the
danda (`U+0964`) and double danda (`U+0965`) punctuation marks from
the Devanagari block.
## The `` shaping model ##
Processing a run of `` text involves six top-level stages:
1. Identifying syllables and other sequences
2. Initial reordering
3. Applying the basic substitution features from GSUB
4. Final reordering
5. Applying all remaining substitution features from GSUB
6. Applying all remaining positioning features from GPOS
As with other Indic scripts, the initial reordering stage and the
final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage five, however, do not have a mandatory order.
Indic scripts follow many of the same shaping patterns, but they
differ in a few critical characteristics that the shaping engine must
track. These include:
- The position of the base consonant in a syllable.
- The final position of "Reph".
- Whether "Reph" must be requested explicitly or if it is formed by
a specific, implicit sequence.
- Whether the below-base forms feature is applied only to consonants
before the syllable base, only to consonants after the base
consonant, or to both.
- The ordering positions for dependent vowels
(matras). Specifically, right-side, above-base, and below-base
matras follow different rules in different scripts.
All Indic scripts position left-side matras in the same
manner, in the ordering position `POS_PREBASE_MATRA`.
With regard to these common variations, Kannada's specific shaping
characteristics include:
- `BASE_POS_LAST` = The base consonant of a syllable is the last
consonant, not counting any consonants with post-base forms.
- Kannada differs somewhat from other `BASE_POS_LAST` scripts in
that all consonants can use post-base forms. Therefore, the
general base-consonant search algorithm should identify the first
non-"Reph" consonant as the base. This is the expected
behavior, as it allows the same search algorithm to be used
with all `BASE_POS_LAST` scripts.
- `REPH_POS_AFTER_POST` = "Reph" is ordered after the last post-base
consonant form.
- `REPH_MODE_IMPLICIT` = "Reph" is formed by an initial "Ra,Halant" sequence.
- `BLWF_MODE_POST_ONLY` = The below-forms feature is applied only to
post-base consonants.
- `MATRA_POS_TOP` = `POS_BEFORE_SUBJOINED` = Above-base matras are
ordered before any subjoined (i.e., below-base) consonant forms.
- `MATRA_POS_RIGHT` = Kannada includes right-side matras that follow two
different reordering rules.
- Matras "Sign Vocalic R" (`U+0CC3`), "Sign Vocalic Rr"
(`U+0CC4`), "Sign Ee" (`U+0CC7`), "Sign Ai" (`U+0CC8`), "Sign
O" (`U+0CCA`), "Sign Oo" (`U+0CCB`), "Length Mark" (`U+0CD5`),
and "Ai Length Mark" (`U+0CD6`) use `POS_AFTER_SUBJOINED` =
These right-side matras are ordered after all subjoined (i.e.,
below-base) consonant forms.
- Matras "Sign Aa"(`U+0CBE`), "Sign Ii" (`U+0CC0`), "Sign U"
(`U+0CC1`), and "Sign Uu" (`U+0CC2`) use
`POS_BEFORE_SUBJOINED` = These right-side matras are ordered before
all subjoined (i.e., below-base) consonant forms.
- `MATRA_POS_BOTTOM` = `POS_BEFORE_SUBJOINED` = Below-base matras are
ordered before the any subjoined (i.e., below-base) consonant forms.
These characteristics determine how the shaping engine must reorder
certain glyphs, how base consonants are determined, and how "Reph"
should be encoded within a run of text.
### Stage 1: Identifying syllables and other sequences ###
A syllable in Kannada consists of a valid orthographic sequence
that may be followed by a "tail" of modifier signs.
> Note: The Kannada Unicode block enumerates four modifier signs,
> "Candrabindu" (`U+0C81`), "Anusvara" (`U+0C82`), "Visarga"
> (`U+0C83`), and "Avagraha" (`U+0CBD`) In addition, Sanskrit text
> written in Kannada may include additional signs from Vedic
> Extensions block.
>
> Note also that the "Spacing Candrabindu" (`U+0C80`) is a letter, not
> a modifier sign.
Each syllable contains exactly one vowel sound. Valid syllables may
begin with either a consonant or an independent vowel.
If the syllable begins with a consonant, then the consonant that
provides the vowel sound is referred to as the "base" consonant. If
the syllable begins with an independent vowel, that independent vowel
is the syllable's only vowel sound and serves as the "base".
> Note: A consonant that is not accompanied by a dependent vowel (matra) sign
> carries the script's inherent vowel sound. This vowel sound is changed
> by a dependent vowel (matra) sign following the consonant.
Kannada uses the `BASE_POS_LAST` characteristic mentioned
earlier. However, because all consonants in the script can potentially
take on post-base consonant forms, the outcome of the shaping
characteristic may be counterintuitive.
Generally speaking, the base consonant is the first consonant of the
syllable, which is rendered in full form, and any subsequent
consonants are rendered in special post-base forms.
Each of these post-base consonants will be preceded by the "Halant" mark, which
indicates that they carry no vowel. They affect pronunciation by
combining with the base consonant (e.g., "_str_", "_pl_") but they
do not add a vowel sound.
As with other Indic scripts, the consonant "Ra" receives special
treatment; in many circumstances it is replaced by a combining
mark-like form.
- A "Ra,Halant" sequence at the beginning of a syllable is replaced
with a right-side mark called "Reph" (unless the "Ra" is the only
consonant in the syllable). This rule is synonymous with the
`REPH_MODE_IMPLICIT` characteristic mentioned earlier.
"Reph" characters must be reordered after the syllable-identification
stage is complete.
> Note: Generally speaking, OpenType fonts will implement support for
> any below-base, post-base, and pre-base-reordering consonant forms
> by including the necessary substitution rules in their `blwf`,
> `pstf`, and `pref` lookups in GSUB.
>
> Consequently, whenever shaping engines need to determine whether or
> not a given consonant can take on such a special form, the most
> appropriate test is to check if the consonant is included in the
> relevant GSUB lookup. Other implementations are possible, such as
> maintaining static tables of consonants, but checking for GSUB
> support ensures that the expected behavior is implemented in the
> active font, and is therefore the most reliable approach.
In addition to valid syllables, standalone sequences may occur, such
as when an isolated codepoint is shown in example text.
> Note: Foreign loanwords, when written in the Kannada script, may
> not adhere to the syllable-formation rules described above. In
> particular, it is not uncommon to encounter foreign loanwords that
> contain a word-final suffix of consonants.
>
> Nevertheless, such word-final suffixes will be correctly matched by
> the regular expressions listed below. These loanwords are pronounced
> different, which raises issues for potential readers, but the
> character sequences do not affect the shaping process.
Syllables should be identified by examining the run and matching
glyphs, based on their categorization, using regular expressions.
The following general-purpose Indic-shaping regular expressions can be
used to match Kannada syllables.
The regular expressions utilize the shaping classes from the tables
above. For the purpose of syllable identification, more general
classes can be used, as defined in the following table. This
simplifies the resulting expressions.
```markdown
_ra_ = The consonant "Ra"
_consonant_ = ( `CONSONANT` | `CONSONANT_DEAD` ) - _ra_
_vowel_ = `VOWEL_INDEPENDENT`
_nukta_ = `NUKTA`
_halant_ = `VIRAMA`
_zwj_ = `JOINER`
_zwnj_ = `NON_JOINER`
_matra_ = `VOWEL_DEPENDENT` | `PURE_KILLER`
_syllablemodifier_ = `SYLLABLE_MODIFIER` | `BINDU` | `VISARGA` | `GEMINATION_MARK`
_vedicsign_ = `CANTILLATION`
_placeholder_ = `PLACEHOLDER` | `CONSONANT_PLACEHOLDER` | `NUMBER`
_dottedcircle_ = `DOTTED_CIRCLE`
_repha_ = `CONSONANT_PRE_REPHA`
_consonantmedial_ = `CONSONANT_MEDIAL`
_symbol_ = `SYMBOL` | `AVAGRAHA`
_consonantwithstacker_ = `CONSONANT_WITH_STACKER`
_other_ = `OTHER` | `MODIFYING_LETTER`
```
> Note: the _ra_ identification class is mutually exclusive with
> the _consonant_ class. The union of the _consonant_ and _ra_ classes
> is used in the regular expression elements below in order to
> correctly identify "Ra" characters that do not trigger "Reph" or
> "Rakaar" shaping behavior.
>
> Note, also, that the cantillation mark "combining Ra" in the
> Devanagari Extended block does _not_ belong to the _ra_
> identification class, and that the other "combining consonant"
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples of
> _placeholder_ codepoints include hyphens and non-breaking
> spaces. Sequences that utilize this approach should be identified as
> "standalone" syllables.
>
> The _placeholder_ identification class also includes numerals, which
> are commonly used as word substitutes within normal text. Examples
> include ordinals (e.g., "4th").
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
> occur within words, they evoke no behavior from the shaping
> engine and do not factor into the regular expressions that
> follow. Therefore, the shaping engine may choose to ignore them
> during syllable identification; they are listed here for completeness.
These identification classes form the bases of the following regular
expression elements:
```markdown
C = _consonant_ | _ra_
Z = _zwj_ | _zwnj_
REPH = (_ra_ _halant_) | _repha_
CN = C _zwj_? _nukta_?
FORCED_RAKAR = _zwj_ _halant_ _zwj_ _ra_
S = _symbol_ _nukta_?
MATRA_GROUP = Z{0,3} _matra_ _nukta_? (_halant_ | FORCED_RAKAR)?
SYLLABLE_TAIL = (Z? _syllablemodifier_ _syllablemodifier_? _zwnj_?)? _vedicsign_{0,3}
HALANT_GROUP = Z? _halant_ (_zwj_ _nukta_?)?
FINAL_HALANT_GROUP = HALANT_GROUP | (_halant_ _zwnj_)
MEDIAL_GROUP = _consonantmedial_?
HALANT_OR_MATRA_GROUP = FINAL_HALANT_GROUP | MATRA_GROUP*)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(MATRA_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(MATRA_GROUP){0,4}` .
Using the above elements, the following regular expressions define the
possible syllable types:
A consonant-based syllable will match the expression:
```markdown
(_repha_|_consonantwithstacker_)? (CN HALANT_GROUP)* CN MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(CN HALANT_GROUP)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(CN HALANT_GROUP){0,4}` .
A vowel-based syllable will match the expression:
```markdown
REPH? _vowel_ _nukta_? (_zwj_ | (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL)
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
A standalone syllable will match the expression:
```markdown
((_repha_|_consonantwithstacker_)? _placeholder_ | REPH? _dottedcircle_) _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
> Note: Although they are labeled as "standalone syllables" here,
> many sequences that match the standalone regular expression above
> are instances where a document needs to display a matra, combining
> mark, or special form in isolation. Such sequences might not have
> any significance with regard to the definition of syllables used in
> the language or orthography of the text.
A symbol-based syllable will match the expression:
```markdown
S SYLLABLE_TAIL
```
A broken syllable will match the expression:
```markdown
REPH? _nukta_? (HALANT_GROUP CN)* MEDIAL_GROUP HALANT_OR_MATRA_GROUP SYLLABLE_TAIL
```
> Note: Practically speaking, shaping engines are highly unlikely to
> encounter more than 4 sequential `(HALANT_GROUP CN)`
> instances in any real-word syllables. Thus, implementations may
> choose to limit occurrences by limiting the above expressions to a
> finite length, such as `(HALANT_GROUP CN){0,4}` .
The primary problem involved in shaping broken syllables is the lack
of a syllable base (either a base consonant or an independent
vowel). Without a syllable base, the shaping engine cannot perform
GPOS positioning and other contextual operations that are required
later in the shaping process.
To make up for this limitation, shaping engines should insert a
dotted-circle placeholder (`U+25CC`) character into the text stream
where the missing syllable base was expected to occur. This
placeholder allows the shaping process to proceed on a best-effort
basis at handling the broken-syllable sequence, but making guarantees
about the orthographic correctness or preferred appearance of the
final result is out of scope for this document.
Shaping engines can perform this dotted-circle insertion at any point
after the broken syllable has been recognized and before GSUB features
are applied. However, the best results will likely be attained by
performing the insertion immediately, before proceeding to
stage 2. This will enable the maximum number of GSUB and GPOS features
in the active font to be correctly applied to the text run by ensuring
that all reordering, tagging, and sorting algorithms are executed as
usual.
> Note: In software stacks where other text-handling operations, such
> as Unicode normalization and localization, are performed before the
> text run is passed to the shaping engine, there is a potential for
> the dotted-circle insertion to cause unexpected effects.
>
> For example, if a `ccmp` or `locl` feature substitutes the default
> dotted-circle placeholder glyph with a variant glyph of a different
> size or weight for the (`U+25CC`) codepoint, then any shaping engine
> which relies on another software component to handle that
> functionality must take additional care to ensure consistency.
The expressions above use state-machine syntax from the Ragel
state-machine compiler. The operators represent:
```markdown
a* = zero or more copies of a
b+ = one or more copies of b
c? = optional instance of c
d{n} = exactly n copies of d
d{,n} = zero to n copies of d
d{n,} = n or more copies of d
d{n,m} = n to m copies of d
!e = not e
^f = character-level not f
g.h = concatenation of g and h
i|j = i or j
( ) = grouping of expression elements
```
After the syllables have been identified, each of the subsequent
shaping stages occurs on a per-syllable basis.
### Stage 2: Initial reordering ###
The initial reordering stage is used to relocate glyphs from the
phonetic order in which they occur in a run of text to the
orthographic order in which they are presented visually.
> Note: Primarily, this means moving dependent-vowel (matra) glyphs,
> "Ra,Halant" glyph sequences, and other consonants that take special
> treatment in some circumstances.
>
> These reordering moves are mandatory. The final-reordering stage
> may make additional moves, depending on the text and on the features
> implemented in the active font.
The syllable should be processed by tagging each glyph with its
intended position based on its ordering category. After all glyphs
have been tagged, the entire syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
The final sort order of the ordering categories should be:
POS_RA_TO_BECOME_REPH
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN
POS_ABOVEBASE_CONSONANT
POS_BEFORE_SUBJOINED
POS_BELOWBASE_CONSONANT
POS_AFTER_SUBJOINED
POS_BEFORE_POST
POS_POSTBASE_CONSONANT
POS_AFTER_POST
POS_FINAL_CONSONANT
POS_SMVD
This sort order enumerates all of the possible final positions to
which a codepoint might be reordered, across all of the Indic
scripts. It includes some ordering categories not utilized in
Kannada.
The basic positions (left to right) are "Reph"
(`POS_RA_TO_BECOME_REPH`), dependent vowels (matras) and consonants
positioned before the base consonant or syllable base
(`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base consonant
or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base
consonant or syllable base (`POS_POSTBASE_CONSONANT`), syllable-final
consonants (`POS_FINAL_CONSONANT`), and syllable-modifying or Vedic
signs (`POS_SMVD`).
In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
positioned before or after any post-base consonants, respectively.
For shaping-engine implementers, the names used for the ordering
categories matter only in that they are unambiguous.
For a definition of the "base" consonant, refer to stage 2, step 1, which
follows.
#### Stage 2, step 1: Base consonant ####
The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_SYLLABLE_BASE`.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base, and it should be tagged
as `POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a standalone sequence or other syllable that begins with a placeholder
or dotted circle, the placeholder or dotted circle will always serve
as the syllable base, and it should be tagged as
`POS_SYLLABLE_BASE`. The shaping engine can then proceed to step 2.
In a syllable that begins with a consonant, the shaping engine must
determine the base consonant by a script-specific algorithm.
> Note: Shaping engines may choose to treat independent-vowel bases
> like base consonants for the sake of simplicity or code
> reuse.
>
> However, implementations that take this approach should note
> that removing the distinction between base consonants and
> independent-vowel bases entirely may have unintended
> consequences. Making guarantees about the correctness of the results
> or about language-specific tests is out of scope for this document.
The base consonant is defined as the consonant in a consonant-based
syllable that carries the syllable's vowel sound. That vowel sound
will either be provided by the script's inherent vowel (in which case
it is not written with a separate character) or the sound will be designated
by the addition of a dependent-vowel (matra) sign.
While performing the base-consonant search, shaping engines may
also encounter special-form consonants, including below-base
consonants and post-base consonants. Each of these special-form
consonants must also be tagged (`POS_BELOWBASE_CONSONANT`,
`POS_POSTBASE_CONSONANT`, respectively).
Any pre-base-reordering consonant (such as a pre-base-reordering "Ra")
encountered during the base-consonant search must be tagged
`POS_POSTBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
The algorithm for determining the base consonant is
- If the syllable starts with "Ra,Halant" and the syllable contains
more than one consonant, exclude the starting "Ra" from the list of
consonants to be considered.
- Starting from the end of the syllable, move backwards until a consonant is found.
* If the consonant is the first consonant, stop.
* If the consonant is preceded by the sequence "Halant,ZWJ", stop.
* If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`, then move to the previous consonant.
* If the consonant has a post-base form, tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If the consonant is a pre-base-reordering "Ra", tag it as
`POS_POSTBASE_CONSONANT`, then move to the previous consonant.
* If none of the above conditions is true, stop.
- The consonant stopped at will be the base consonant.
> Note: The algorithm is designed to work for all Indic
> scripts. However, Kannada does not utilize pre-base-reordering "Ra".
>
> Also, it is important to note that all consonants in Kannada have a
> post-base form, therefore the backwards-search step will
> automatically move past them until it reaches either a "Ra,Halant"
> sequence or the first consonant. However, this condition is not the
> same as the shaping characteristic `BASE_POS_FIRST`, which does not
> use the above search algorithm at all.
> Note: Because Kannada employs the `BLWF_MODE_POST_ONLY` shaping
> characteristic, consonants with below-base special forms will occur
> only after the base consonant or syllable base.
>
> During the base-consonant search, therefore, all of these below-base
> form sequences will be encountered and tagged correctly as
> "Halant,_consonant_" patterns. Stage 2, step 5 below exists to ensure that
> the "_consonant_,Halant" pattern preceding the base consonant or syllable base
> for below-base forms in other Indic scripts will also be tagged correctly.
#### Stage 2, step 2: Matra decomposition ####
Second, any multi-part dependent vowels (matras) must be decomposed
into their independent components. Kannada has five
multi-part dependent vowels, "Ii" (`U+0CC0`), "Ee" (`U+0CC7`), "Ai"
(`U+0CC8`), "O" (`U+0CCA`), and "Oo" (`U+0CCB`). Each
has a canonical decomposition, so this step is unambiguous.
> "Ii" (`U+0CC0`) decomposes to "`U+0CBF`,`U+0CD5`"
>
> "Ee" (`U+0CC7`) decomposes to "`U+0CC6`,`U+0CD5`"
>
> "Ai" (`U+0CC8`) decomposes to "`U+0CC6`,`U+0CD6`"
>
> "O" (`U+0CCA`) decomposes to "`U+0CC6`,`U+0CC2`"
>
> "Oo" (`U+0CCB`) decomposes to "`U+0CCA`,`U+0CD5`"
>
Because this decomposition is a character-level operation, the shaping
engine may choose to perform it earlier, such as during an initial
Unicode-normalization stage. However, all such decompositions must be
completed before the shaping engine begins step three, below.
> Note: The decomposition of "Oo" (`U+0CCB`) is atypical; Unicode
> specifies that the codepoint decomposes to "O" (`U+0CCA`) followed
> by `U+0CD5`; the "O" codepoint is then decomposed to
> "`U+0CC6`,`U+0CC2`". Shaping engines must take care not to miss this
> second decomposition.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-matra-decomposition}
Multi-part matra decomposition
:::
```{svg-color-toggle-button} kannada-matra-decomposition
```
#### Stage 2, step 3: Tag matras ####
Third, all dependent-vowel (matra) signs, including those that
resulted from the preceding decomposition step, must be tagged to be
moved to the correct position in the syllable.
Left-side matras should be tagged with `POS_PREBASE_MATRA`.
Above-base matras should be tagged with `POS_BEFORE_SUBJOINED`.
Right-side matras should be tagged according to two rules.
- Matras "Sign Vocalic R" (`U+0CC3`), "Sign Vocalic Rr"
(`U+0CC4`), "Sign Ee" (`U+0CC7`), "Sign Ai" (`U+0CC8`), "Sign
O" (`U+0CCA`), "Sign Oo" (`U+0CCB`), "Length Mark" (`U+0CD5`),
and "Ai Length Mark" (`U+0CD6`) should be tagged with
`POS_AFTER_SUBJOINED`.
- Matras "Sign Aa"(`U+0CBE`), "Sign Ii" (`U+0CC0`), "Sign U"
(`U+0CC1`), and "Sign Uu" (`U+0CC2`) use
`POS_BEFORE_SUBJOINED`.
> Note: the right-side matras "Sign Ee" (`U+0CC7`), "Sign Ai"
> (`U+0CC8`), "Sign O" (`U+0CCA`), and "Sign Oo" (`U+0CCB`) are
> multi-part matras and were decomposed into independent components
> during stage 2, step 2. They are listed here only to ensure that the
> two position-tagging rules used in Kannada are described completely.
Below-base matras should be tagged with `POS_BEFORE_SUBJOINED`.
For simplicity, shaping engines may choose to tag single-part matras
in an earlier text-processing step, using the information in the
_Mark-placement subclass_ column of the character tables. It is
critical at this step, however, that all decomposed matras are also
correctly tagged before proceeding to the next step.
#### Stage 2, step 4: Adjacent marks ####
Fourth, any subsequences of marks that include a "Nukta" and a
"Halant" or Vedic sign must be reordered so that the "Nukta" appears
first.
This means that the subsequence "Halant,Nukta" is reordered to
"Nukta,Halant" and that the subsequence "_Vedic_sign_,Nukta" is
reordered to "Nukta,_Vedic_sign".
For subsequences of affected marks that are longer than two, the
reordering operation must be repeated until the "Nukta" is the first
character in the subsequence. No other marks in the subsequence
should be reordered.
This order is canonical in Unicode and is required so that
"_consonant_,Nukta" substitution rules from GSUB will be correctly
matched later in the shaping process.
#### Stage 2, step 5: Pre-base consonants ####
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:
- If the consonant has a below-base form, tag it as
`POS_BELOWBASE_CONSONANT`.
- Otherwise, tag it as `POS_PREBASE_CONSONANT`.
> Note: Shaping engines may choose any method to identify consonants that
> have below-base, post-base, or pre-base-reordering forms while
> executing the above algorithm. For example, one implementation may
> choose to maintain a static table of special-form consonants to
> compare against the text run. Another implementation might examine
> the active font to see if it includes a `blwf`, `pstf`, or `pref`
> lookup in the GSUB table that affects the consonants encountered in
> the syllable.
>
> However, checking for GSUB support ensures that the expected
> behavior is implemented in the active font, and is therefore the
> most reliable approach.
Kannada does not use any pre-base consonants; this step is listed here
because it is part of the general processing scheme for shaping Indic scripts.
> Note: Because Kannada employs the `BLWF_MODE_POST_ONLY` shaping
> characteristic, consonants with below-base special forms will occur
> only after the base consonant or syllable base.
>
> During the base-consonant search in stage 2, step 1, therefore, all of these below-base
> form sequences will be encountered and tagged correctly as
> "Halant,_consonant_" patterns. The tagging is this step ensures that
> the "_consonant_,Halant" pattern preceding the base consonant or syllable base
> for below-base forms in other Indic scripts will also be tagged correctly.
#### Stage 2, step 6: Reph ####
Sixth, initial "Ra,Halant" sequences that will become "Reph"s must be tagged with
`POS_RA_TO_BECOME_REPH`.
> Note: an initial "Ra,Halant" sequence will always become a "Reph"
> unless the "Ra" is the only consonant in the syllable.
#### Stage 2, step 7: Final consonants ####
Seventh, all final consonants must be tagged. Consonants that occur
after the syllable base _and_ after a dependent vowel (matra) sign
must be tagged with `POS_FINAL_CONSONANT`.
> Note: Final consonants occur only in Sinhala and should not be
> expected in `` text runs. This step is included here to
> maintain compatibility across Indic scripts.
#### Stage 2, step 8: Mark tagging ####
Eighth, all marks must be tagged.
> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
`SYLLABLE_MODIFIER`, `GEMINATION_MARK`, and `SYMBOL` categories should
be tagged with `POS_SMVD`.
All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant, independent vowel, placeholder, or dotted circle.
All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base. In addition, an exception is
made for "Halant" marks that follow a left-side (pre-base) matra.
1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.
3. Finally, "Halant" marks that follow a left-side dependent vowel
(matra) should _not_ be tagged with the left-side matra's
positioning tag. Instead, the "Halant" should be tagged with the
positioning tag of the non-mark character preceding the left-side
matra. This prevents the "Halant" mark from being moved with the
left-side matra when the syllable is sorted.
#### Stage 2, step 9: Sort syllable ####
With these steps completed, the syllable can be sorted into the final
sort order as listed at the beginning of stage 2.
The glyphs in the syllable should be sorted in stable order,
so that glyphs of the same ordering category remain in the same
relative position with respect to each other.
#### Stage 2, step 10: Flag sequences for possible feature applications ####
With the initial reordering complete, those glyphs in the syllable that
may have GSUB or GPOS features applied in stages 3, 5, and 6 should be
flagged for each potential feature.
This flagging is preliminary; the set of potential features varies
between different scripts and which features are supported varies
between fonts. It is also possible that the application of
one feature on a glyph sequence will perform a substitution that makes
a later feature no longer applicable to the updated sequence.
Consequently, the flagging must be completed before shaping proceeds
to the stages during which features are applied.
Some shaping features, such as `locl`, can potentially apply to any
glyphs. Therefore it is not necessary to maintain a separate flag for
these features in the bitmask (or other data structure) used to track
the flags -- although shaping engines may do so if desired.
The sequences to flag are summarized in the list below; a full
description of each feature's function and interpretation is provided
in GSUB and GPOS application stages that follow.
- `nukt` should match "_Consonant_,Nukta" sequences
- `akhn` should match "Ka,Halant,Ssa" and "Ja,Halant,Nya"
- `rphf` should match initial "Ra,Halant" sequences but _not_ match
initial "Ra,Halant,ZWJ" sequences
- `pref` should match "_Consonant_,Ra" in post-base positions
- `blwf` should match "Halant,_Consonant_" in post-base positions
- `half` should match "_Consonant_,Halant" in pre-base position but
_not_ match "Ra,Halant" sequences flagged for `rphf` and
_not_ match "_Consonant_,Halant,ZWNJ,_Consonant_" sequences
- `pstf` should match "Halant,_Consonant_" in post-base position
- `cjct` should match "_Consonant_,Halant,_Consonant_" but _not_
match "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_"
### Stage 3: Applying the basic substitution features from GSUB ###
The basic-substitution stage applies mandatory substitution features
using the rules in the font's GSUB table. In preparation for this
stage, glyph sequences should be flagged for possible application
of GSUB features in stage 2, step 10.
The order in which these substitutions must be performed is fixed for
all Indic scripts:
locl
nukt
akhn
rphf
rkrf (not used in Kannada)
pref
blwf
abvf (not used in Kannada)
half
pstf
vatu (not used in Kannada)
cjct
cfar (not used in Kannada)
#### Stage 3, step 1: locl ####
The `locl` feature replaces default glyphs with any language-specific
variants, based on examining the language setting of the text run.
> Note: Strictly speaking, the use of localized-form substitutions is
> not part of the shaping process, but of the localization process,
> and could take place at an earlier point while handling the text
> run. However, shaping engines are expected to complete the
> application of the `locl` feature before applying the subsequent
> GSUB substitutions in the following steps.
#### Stage 3, step 2: nukt ####
The `nukt` feature replaces "_Consonant_,Nukta" sequences with a
precomposed nukta-variant of the consonant glyph.
- The context defined for a `nukt` feature is:
:::{table} `nukt` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:------------------------------|:--------------|
| _none_ | `_consonant_`(full),`_nukta_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-nukt}
Nukta composition
:::
```{svg-color-toggle-button} kannada-nukt
```
#### Stage 3, step 3: akhn ####
The `akhn` feature replaces two specific sequences with required ligatures.
- "Ka,Halant,Ssa" is substituted with the "KSsa" ligature.
- "Ja,Halant,Nya" is substituted with the "JNya" ligature.
These sequences can occur anywhere in a syllable. The "KSsa" and
"JNya" characters have orthographic status equivalent to full
consonants in some languages, and fonts may have `cjct` substitution
rules designed to match them in subsequences. Therefore, this
feature must be applied before all other many-to-one substitutions.
- The context defined for an `akhn` feature is:
:::{table} `akhn` feature context
| Backtrack | Matching sequence | Lookahead |
|:--------------|:----------------------------|:--------------|
| _none_ | `AKHAND_CONSONANT_SEQUENCE` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-akhn-kssa}
KSsa ligation
:::
```{svg-color-toggle-button} kannada-akhn-kssa
```
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-akhn-jnya}
JNya ligation
:::
```{svg-color-toggle-button} kannada-akhn-jnya
```
#### Stage 3, step 4: rphf ####
The `rphf` feature replaces initial "Ra,Halant" sequences with the
"Reph" glyph.
- An initial "Ra,Halant,ZWJ" sequence, however, must not be flagged for
the `rphf` substitution.
- The context defined for a `rphf` feature is:
:::{table} `rphf` feature context
| Backtrack | Matching sequence | Lookahead |
|:-----------------|:------------------------|:--------------|
| `SYLLABLE_START` | "Ra"(full),`_halant_` | _none_ |
:::
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-rphf}
Reph composition
:::
```{svg-color-toggle-button} kannada-rphf
```
#### Stage 3, step 5: rkrf ####
> This feature is not used in Kannada.
#### Stage 3, step 6: pref ####
The `pref` feature replaces pre-base-reordering consonant glyphs with
any special forms.
The substitution of the nominal glyph for its special form takes place
at this stage. However, the actual reordering move is performed later,
in stage 4, step 4.
> Note: Kannada does not usually incorporate pre-base-reordering
> consonant forms, but it is possible for a font to implement them in
> order to provide for desired typographic variation.
#### Stage 3, step 7: blwf ####
The `blwf` feature replaces below-base-consonant glyphs with any
special forms. All consonants in Kannada can take on a below-base consonant
form.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-blwf}
Below-base form composition
:::
```{svg-color-toggle-button} kannada-blwf
```
#### Stage 3, step 8: abvf ####
> This feature is not used in Kannada.
#### Stage 3, step 9: half ####
The `half` feature replaces "_Consonant_,Halant" sequences before the
base consonant or syllable base with "half forms" of the consonant
glyphs.
In the most common case, this substitution applies to
"_Consonant_,Halant" sequences that are followed by another
"_Consonant_".
In addition, a sequence matching "_Consonant_,Halant,ZWJ" must also be
flagged for potential `half` substitutions.
> Note: The presence of the "ZWJ" at the end of the sequence means
> that the sequence may match the regular-expression test in stage 1
> as the end of a syllable, even without being followed by a base
> consonant or syllable base.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
There are two exceptions to the default behavior, for which the
shaping engine must test:
- Initial "Ra,Halant" sequences, which should have been flagged for
the `rphf` feature earlier, must not be flagged for potential
`half` substitutions.
- A sequence matching "_Consonant_,Halant,ZWNJ,_Consonant_" must not be
flagged for potential `half` substitutions.
> Note: Kannada does not usually incorporate half forms, but it is
> possible for a font to implement them in order to provide for
> desired typographic variation.
#### Stage 3, step 10: pstf ####
The `pstf` feature replaces post-base-consonant glyphs with any special forms.
#### Stage 3, step 11: vatu ####
> This feature is not used in Kannada.
#### Stage 3, step 12: cjct ####
The `cjct` feature replaces sequences of adjacent consonants with
conjunct ligatures. These sequences must match "_Consonant_,Halant,_Consonant_".
A sequence matching "_Consonant_,Halant,ZWJ,_Consonant_" or
"_Consonant_,Halant,ZWNJ,_Consonant_" must not be flagged to form a conjunct.
> Note: The presence of the "ZWJ" in a
> "_Consonant_,Halant,ZWJ,_Consonant_" sequence should automatically
> inhibit any `cjct` feature rules from matching the sequence as valid
> input, and thus prevent the `cjct` substitution from being applied.
> Note: The presence of the "ZWNJ" in a
> "_Consonant_,Halant,ZWNJ,_Consonant_" sequence means that the
> "_Consonant_,Halant,ZWNJ" subsequence will match the
> regular-expression test in stage 1 as the end of a syllable.
>
> Because OpenType shaping features in `` are defined as
> applying only within an individual syllable, this means that the
> presence of the "ZWNJ" will automatically prevent the application of
> a `cjct` feature by triggering the identification of a syllable
> break between the two consonants.
>
> The fact that the regular-expression tests identify a syllable break
> after the "_Consonant_,Halant,ZWNJ" is a byproduct of OpenType
> shaping and Unicode encoding, however, and might not have any
> significance with regard to the definition of syllables used in the
> language or orthography of the text.
>
> Note, also: The presence of the "ZWJ" means that a
> "_Consonant_,Halant,ZWJ" sequence may match the regular-expression
> test in stage 1 as the end of a syllable, even without being
> followed by a base consonant or syllable base. By definition,
> however, a "_Consonant_,Halant,ZWJ" syllable identified in stage 1
> cannot also include a "_Consonant_" after the ZWJ.
The font's GSUB rules might be implemented so that `cjct`
substitutions apply to half-form consonants; therefore, this feature
must be applied after the `half` feature.
> Note: Kannada does not usually incorporate conjuncts, but it is
> possible for a font to implement the `cjct` feature in order to
> provide for desired typographic variation.
#### Stage 3, step 13: cfar ####
> This feature is not used in Kannada.
### Stage 4: Final reordering ###
The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.
Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
#### Stage 4, step 1: Base consonant ####
The final reordering stage, like the initial reordering stage, begins
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.
In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.
The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.
#### Stage 4, step 2: Pre-base matras ####
Pre-base dependent vowels (matras) that were reordered during the
initial reordering stage must be moved to their final position. This
position is defined as:
- after the last standalone "Halant" glyph that comes after the
matra's starting position and also comes before the main
consonant.
- If a zero-width joiner follows this last standalone "Halant", the
final matra position is moved to after the joiner.
This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.
Kannada does not use pre-base matras, so this step will
involve no work when processing `` text. It is included here in
order to maintain compatibility with the other Indic scripts.
> Note: OpenType and Unicode both state that if the syllable includes
> a ZWJ immediately after the last "Halant", then the final matra
> position should be after the ZWJ.
>
> However, there are several test sequences indicating that
> Microsoft's Uniscribe shaping engine did not follow this rule (in,
> at least, Devanagari and Bengali text), and in these circumstances
> Uniscribe instead makes the final matra position before the final
> "Consonant,Halant,ZWJ".
>
> Subsequently, the HarfBuzz shaping engine has also followed the same
> pattern. If other shaping engine implementations prefer to maintain
> maximum compatibility with Uniscribe and HarfBuzz, then they should
> also follow suit.
> Note: The Microsoft script-development specifications for OpenType
> shaping also state that if a zero-width non-joiner follows the last
> standalone "Halant", the final matra position is moved to after the
> non-joiner. However, it is unnecessary to test for this condition,
> because a "Halant,ZWNJ" subsequence is, by definition, the end of a
> syllable. Consequently, a "Halant,ZWNJ" cannot be followed by a
> pre-base dependent vowel.
#### Stage 4, step 3: Reph ####
"Reph" must be moved from the beginning of the syllable to its final
position. Because Kannada incorporates the `REPH_POS_AFTER_POST`
shaping characteristic, this final position is defined to be
immediately after any post-base consonant forms.
The algorithm for finding the final "Reph" position is
- Move the "Reph" to the position immediately before
the first post-base matra, syllable modifier, or Vedic sign that
has a positioning tag after the script's "Reph" position in the
syllable sort order (as listed in [stage
2](#stage-2-initial-reordering)). This will be the final "Reph"
position.
> Note: Because Kannada incorporates the
> `REPH_POS_AFTER_POST` shaping characteristic, this means
> any positioning tag of `POS_FINAL_CONSONANT` or later,
> although a post-base matra, syllable modifier, or Vedic sign
> would not typically be tagged with `POS_FINAL_CONSONANT`.
- If no other location has been located in the previous step, move
the "Reph" to the end of the syllable.
Finally, if the final position of "Reph" occurs after a
"_matra_,Halant" subsequence, then "Reph" must be repositioned to the
left of "Halant", to allow for potential matching with `abvs` or
`psts` substitutions from GSUB.
:::{figure-md}
{.shaping-demo .inline-svg .greyscale-svg #kannada-reph-position}
Reph positioning
:::
```{svg-color-toggle-button} kannada-reph-position
```
#### Stage 4, step 4: Pre-base-reordering consonants ####
Any pre-base-reordering consonants must be moved to immediately before
the base consonant or syllable base.
Kannada does not use pre-base-reordering consonants, so this step will
involve no work when processing `` text. It is included here in order
to maintain compatibility with the other Indic scripts.
#### Stage 4, step 5: Initial matras ####
Any left-side dependent vowels (matras) that are at the start of a
word must be flagged for potential substitution by the `init` feature
of GSUB.
Kannada does not use the `init` feature, so this step will
involve no work when processing `` text. It is included here in
order to maintain compatibility with the other Indic scripts.
### Stage 5: Applying all remaining substitution features from