[
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "\n# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participation in our\ncommunity a harassment-free experience for everyone, regardless of age, body\nsize, visible or invisible disability, ethnicity, sex characteristics, gender\nidentity and expression, level of experience, education, socio-economic status,\nnationality, personal appearance, race, caste, color, religion, or sexual\nidentity and orientation.\n\nWe pledge to act and interact in ways that contribute to an open, welcoming,\ndiverse, inclusive, and healthy community.\n\n## Our Standards\n\nExamples of behavior that contributes to a positive environment for our\ncommunity include:\n\n* Demonstrating empathy and kindness toward other people\n* Being respectful of differing opinions, viewpoints, and experiences\n* Giving and gracefully accepting constructive feedback\n* Accepting responsibility and apologizing to those affected by our mistakes,\n  and learning from the experience\n* Focusing on what is best not just for us as individuals, but for the overall\n  community\n\nExamples of unacceptable behavior include:\n\n* The use of sexualized language or imagery, and sexual attention or advances of\n  any kind\n* Trolling, insulting or derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or email address,\n  without their explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\n  professional setting\n\n## Enforcement Responsibilities\n\nCommunity leaders are responsible for clarifying and enforcing our standards of\nacceptable behavior and will take appropriate and fair corrective action in\nresponse to any behavior that they deem inappropriate, threatening, offensive,\nor harmful.\n\nCommunity leaders have the right and responsibility to remove, edit, or reject\ncomments, commits, code, wiki edits, issues, and other contributions that are\nnot aligned to this Code of Conduct, and will communicate reasons for moderation\ndecisions when appropriate.\n\n## Scope\n\nThis Code of Conduct applies within all community spaces, and also applies when\nan individual is officially representing the community in public spaces.\nExamples of representing our community include using an official e-mail address,\nposting via an official social media account, or acting as an appointed\nrepresentative at an online or offline event.\n\n## Enforcement\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported to the community leaders responsible for enforcement at\n[INSERT CONTACT METHOD].\nAll complaints will be reviewed and investigated promptly and fairly.\n\nAll community leaders are obligated to respect the privacy and security of the\nreporter of any incident.\n\n## Enforcement Guidelines\n\nCommunity leaders will follow these Community Impact Guidelines in determining\nthe consequences for any action they deem in violation of this Code of Conduct:\n\n### 1. Correction\n\n**Community Impact**: Use of inappropriate language or other behavior deemed\nunprofessional or unwelcome in the community.\n\n**Consequence**: A private, written warning from community leaders, providing\nclarity around the nature of the violation and an explanation of why the\nbehavior was inappropriate. A public apology may be requested.\n\n### 2. Warning\n\n**Community Impact**: A violation through a single incident or series of\nactions.\n\n**Consequence**: A warning with consequences for continued behavior. No\ninteraction with the people involved, including unsolicited interaction with\nthose enforcing the Code of Conduct, for a specified period of time. This\nincludes avoiding interactions in community spaces as well as external channels\nlike social media. Violating these terms may lead to a temporary or permanent\nban.\n\n### 3. Temporary Ban\n\n**Community Impact**: A serious violation of community standards, including\nsustained inappropriate behavior.\n\n**Consequence**: A temporary ban from any sort of interaction or public\ncommunication with the community for a specified period of time. No public or\nprivate interaction with the people involved, including unsolicited interaction\nwith those enforcing the Code of Conduct, is allowed during this period.\nViolating these terms may lead to a permanent ban.\n\n### 4. Permanent Ban\n\n**Community Impact**: Demonstrating a pattern of violation of community\nstandards, including sustained inappropriate behavior, harassment of an\nindividual, or aggression toward or disparagement of classes of individuals.\n\n**Consequence**: A permanent ban from any sort of public interaction within the\ncommunity.\n\n## Attribution\n\nThis Code of Conduct is adapted from the [Contributor Covenant][homepage],\nversion 2.1, available at\n[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].\n\nCommunity Impact Guidelines were inspired by\n[Mozilla's code of conduct enforcement ladder][Mozilla CoC].\n\nFor answers to common questions about this code of conduct, see the FAQ at\n[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at\n[https://www.contributor-covenant.org/translations][translations].\n\n[homepage]: https://www.contributor-covenant.org\n[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html\n[Mozilla CoC]: https://github.com/mozilla/diversity\n[FAQ]: https://www.contributor-covenant.org/faq\n[translations]: https://www.contributor-covenant.org/translations\n\n"
  },
  {
    "path": "README.md",
    "content": "# Unicode Separated Values (USV) ™\n\nUnicode Separated Values (USV) ™ is a data format that uses Unicode characters for markup.\n\n[FAQ](doc/faq/) &bull;\n[RFC](doc/rfc/) &bull;\n[Code](doc/code/) &bull;\n[Comparisons](doc/comparisons/) &bull;\n[TODO](doc/todo/) &bull;\n[XKCD](https://xkcd.com/927/)\n\n\n## Introduction\n\nUnicode Separated Values (USV) enables new ways of working with data as plain text.\n\n* USV builds on ASCII Separated Values (ASV) plus adds capabilities for visible markup.\n\n* USV contrasts with Comma Separated Values (CSV) because USV is more specific and powerful.\n\n* USV is similar in spirit to Markdown (MD) because the purpose is easy freeform text editing.\n\n\n### USV markup\n\nUSV uses Unicode characters for data markup.\n\n* <tt>[U+001F](https://codepoints.net/U+001F)/[U+241F](https://codepoints.net/U+241F)</tt> Unit Separator.\n\n* <tt>[U+001E](https://codepoints.net/U+001E)/[U+241E](https://codepoints.net/U+241E)</tt> Record Separator.\n\n* <tt>[U+001D](https://codepoints.net/U+001D)/[U+241D](https://codepoints.net/U+241D)</tt> Group Separator.\n\n* <tt>[U+001C](https://codepoints.net/U+001C)/[U+241C](https://codepoints.net/U+241C)</tt> File Separator.\n\n* <tt>[U+001B](https://codepoints.net/U+001B)/[U+241B](https://codepoints.net/U+241B)</tt> Escape.\n\n* <tt>[U+0004](https://codepoints.net/U+0004)/[U+2404](https://codepoints.net/U+2404)</tt> End of Transmission.\n\n### USV examples\n\nUSV looks like this for a 1-dimensional data made of units, such as a log. Each unit ends with a Unit Separator character and an optional newline character.\n\n```usv\na␟\nb␟\nc␟\nd␟\n```\n\nUSV looks like this for 2-dimensional data made of units and records, such as a spreadsheet table. Each record ends with a Record Separator character and an optional newline character.\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\nUSV looks like this for 3-dimensional data made of units and records and groups, such as a spreadsheet folio. Each group ends with a Group Separator character and an optional newline character.\n\n\n```usv\nSheet1␟␞\na␟b␟␞\nc␟d␟␞\n␝\nSheet2␟␞\ne␟f␟␞\ng␟h␟␞\n␝\n```\n\nUSV looks like this for 4-dimensional data made of units and records and groups and files, such as a collection of spreadsheet folios. Each file ends with a File Separator character and an optional newline character.\n\n\n```usv\nFolio1␟␞\nSheet1␟␞\na␟b␟␞\nc␟d␟␞\n␝\nSheet2␟␞\ne␟f␟␞\ng␟h␟␞\n␝␜\nFolio2␟␞\nSheet3␟␞\na␟b␟␞\nc␟d␟␞\n␝\nSheet4␟␞\ne␟f␟␞\ng␟h␟␞\n␝␜\n```\n\n\n### USV style\n\nUSV uses style options to display marks in various ways.\n\n* Style Symbols: use visible symbol characters such as `␟`\n\n* Style Controls: use invisible control characters such as `\\u001F`\n\n* Style Braces: use curly-braces with abbreviations such as: `{US}`\n\n\n### USV layout\n\nUSV uses layout options to format data in various ways.\n\n* Layout Default: format the data so it looks good on a typical terminal screen.\n\n* Layout Lines: format each mark with 0 or 1 or 2 surrounding newlines.\n\n* Layout by Units or Records or Groups or Files: format a chunk to display on one line.\n\n\n## Documentation\n\nCore:\n\n* [Markup with separators and modifiers](doc/markup/)\n\n* [Style with symbols, controls, braces](doc/style/)\n\n* [Layout with units, records, groups, files, spacers](doc/layout/)\n\nCommunity:\n\n* [Frequently Asked Questions (FAQ)](doc/faq/)\n\n* [Criticisms and replies](doc/criticisms/)\n\n* [TODO list](doc/todo/)\n\nSpecification:\n\n* [Request For Comments (RFC)](doc/rfc/)\n\n* [Augmented Backus–Naur Form (ABNF)](doc/anbf/)\n\nCode:\n\n* [Code examples and production crates](doc/code/)\n\n* [Command line argument parsing](doc/clap/)\n\nHow to:\n\n* [How to type Unicode characters](doc/how-to-type-unicode-characters/)\n\n* [How to use split and regex](doc/how-to-use-split-and-regex/)\n\nContext:\n\n* [Converters for ASV, CSV, JSON, XLSX](doc/converters/)\n\n* [Comparisons with ASV, CSV, TSV, RSV, JSON](doc/comparisons/)\n\n* [History of ASCII separated values (ASV)](history-of-ascii-separated-values/)\n\nEditor notes:\n\n* [vim notes](doc/editors/vi/)\n\n* [emacs notes](doc/editors/emacs/)\n\nExample files:\n\n* [hello-world.usv](examples/hello-world.usv) versus [hello-world.csv](examples/hello-world.csv)\n\n* [zen-koans.usv](examples/zen-koans.usv) versus [zen-koans.csv](examples/zen-koans.csv)\n\n* [blog-posts.usv](examples/blog-posts.usv) versus [blog-posts.csv](examples/blog-posts.csv)\n\n* [end-of-transmission.usv](examples/end-of-transmission.usv)\n\n\n## Hello World\n\nSuppose you want USV text with two units: \"hello\" and \"world\".\n\nThe USV text with USV symbol characters for unit separators:\n\n```usv\nhello␟world␟\n```\n\nThe USV text with USV control characters for unit separators:\n\n```usv\nhello\\u001Fworld\\u001F\n```\n\n\n## Comparisons to spreadsheets and databases\n\nUSV semantics are units, records, groups, files.\n\nSpreadsheet semantics are cells, lines, sheets, folios.\n\nDatabases semantics are fields, rows, tables, schemas.\n\n\n## Examples\n\nUSV with 2 units by 2 records by 2 groups by 2 files, and the style as sheets:\n\n```usv\na␟b␟␞\nc␟d␟␞\n␝\ne␟f␟␞\ng␟h␟␞\n␝\n␜\ni␟j␟␞\nk␟l␟␞\n␝\nm␟n␟␞\no␟p␟␞\n␝\n␜\n```\n\nParsing example with the USV Rust crate and its iterators:\n\n```rust\nuse usv::*;\nlet text = \"a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜\";\nlet files = text.files();\nfor file in files {\n    for group in file {\n        for record in group {\n            for unit in record {\n                println!(&unit);\n            }\n        }\n    }\n}\n```\n\n\n## Why use USV?\n\nUSV can handle data that contains commas, semicolons, quotes, tabs, newlines, and other special characters, all without escaping.\n\nUSV can format units/columns/cells and records/rows/lines and groups/tables/grids and files/schemas/folios.\n\nUSV aims to be an international standard, and has an official IETF RFCXML Internet Draft.\n\nUSV uses Unicode characters that are semantically meaningful.\n\nUSV works well with any typical modern editor, font, terminal, shell, search, and language.\n\nUSV uses visible letter-width characters, and these are easy to view, select, copy, paste, search.\n\n\n## USV is easy and friendly\n\nUSV is intended to be easy to use and friendly to try.\n\nUSV works with many kinds of data, and many kinds of editors. Any editor that can render the USV characters will work. We use vim, emacs, helix, Zed, VS Code, JEOTrains IDEs, Nova, TextMate, Sublime, Notepad++, etc.\n\nUSV works with many kinds of tools. Any tool that can parse the USV characters will work. We use awk, sed, grep, rg, miller, etc.\n\nUSV works with many kinds of languages. Any language that can handle UTF-8 character encoding and rendering should work. We use C, C++, C#, Elixir, Erlang, Go, Java, JavaScript, Julia, Kotlin, Perl, PHP, Python, R, Ruby, Rust, Swift, TypeScript, etc.\n\n\n## Legal protection for standardization\n\nThe USV project aims to become a free open source IETF standard and IANA standard, much like the standards for CSV and TDF.\n\nUntil the standardization happens, the terms \"Unicode Separated Values\" and \"USV\" are both trademarks of this project. This repository is copyright 2022-2024. The trademarks and copyrights are by Joel Parker Henderson, me, an individual, not a company.\n\nWhen IETF and IANA approve the submissions as a standard, then the trademarks and copyright will go to a free libre open source software advocacy foundation. We welcome advice about how to do this well.\n\n\n## Conclusion\n\nUSV is helping us with data projects. We hope USV may help you too.\n\nWe welcome constructive feedback about USV, as well as git issues, pull requests, and standardization help.\n\n[FAQ](doc/faq/) &bull;\n[RFC](doc/rfc/) &bull;\n[Code](doc/code/) &bull;\n[Comparisons](doc/comparisons/) &bull;\n[TODO](doc/todo/) &bull;\n[XKCD](https://xkcd.com/927/)\n"
  },
  {
    "path": "bin/bash/usv-to-csv.bash",
    "content": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that converts USV to CSV.\n#\n# Note this script is a simple demo, and does not attempt to escape CSV output,\n# such as create a double-quoted unit to protect an embedded comma or newline.\n\nescape=false\ncomma=''\n\nwhile IFS= read -N1 -r c; do\n    if [ \"$escape\" = true ]; then\n        escape=false\n        printf %s \"$c\"\n    else\n        case  \"$c\" in\n        \"\\u001B\" | \"␛\")\n            escape=true\n            ;;\n        \"\\u001F\" | \"␟\")\n            comma=','\n            ;;\n        \"\\u001E\" | \"␞\")\n            printf \"\\n\"\n            comma=''\n            ;;\n        \"\\u001D\" | \"␝\")\n            >&2 printf \"\\nerror: group separator\\n\"\n            ;;\n        \"\\u001C\" | \"␜\")\n            >&2 printf \"\\nerror: file separator\\n\"\n            ;;\n        \"\\u0004\" | \"␄\")\n            break\n            ;;\n        *)\n            printf %s%s \"$comma\" \"$c\"\n            comma=''\n            ;;\n        esac\n    fi\ndone\n"
  },
  {
    "path": "bin/bash/usv-to-debug.bash",
    "content": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that demonstrates the use of USV characters.\n# This script reads STDIN one character at a time, and prints text.\n\nescape=false\n\nwhile IFS= read -N1 -r c; do\n    if [ \"$escape\" = true ]; then\n        escape=false\n        printf %s \"\\nescape character: \" \"$c\"\n    else\n        case  \"$c\" in\n        \"\\u001B\" | \"␛\")\n            printf \"\\nescape\\n\"\n            escape=true\n            ;;\n        \"\\u001F\" | \"␟\")\n            printf \"\\nunit separator\\n\"\n            ;;\n        \"\\u001E\" | \"␞\")\n            printf \"\\nrecord separator\\n\"\n            ;;\n        \"\\u001D\" | \"␝\")\n            printf \"\\ngroup separator\\n\"\n            ;;\n        \"\\u001C\" | \"␜\")\n            printf \"\\nfile separator\\n\"\n            ;;\n        \"\\u0004\" | \"␄\")\n            printf \"\\nend of transmission\\n\"\n            break\n            ;;\n        *)\n            printf %s \"$c\"\n            ;;\n        esac\n    fi\ndone\nprintf \"\\n\"\n"
  },
  {
    "path": "bin/bash/usv-to-display.bash",
    "content": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that demonstrates the use of USV characters.\n# This script reads STDIN one character at a time, and prints text.\n\nescape=false\n\nwhile IFS= read -N1 -r c; do\n    if [ \"$escape\" = true ]; then\n        escape=false\n        printf %s \"$c\"\n    else\n        case  \"$c\" in\n        \"\\u001B\" | \"␛\")\n            escape=true\n            ;;\n        \"\\u001F\" | \"␟\")\n            printf \",\"\n            ;;\n        \"\\u001E\" | \"␞\")\n            printf \"\\n\"\n            ;;\n        \"\\u001D\" | \"␝\")\n            printf \"\\n-\\n\"\n            ;;\n        \"\\u001C\" | \"␜\")\n            printf \"\\n=\\n\"\n            ;;\n        \"\\u0004\" | \"␄\")\n            break\n            ;;\n        *)\n            printf %s \"$c\"\n            ;;\n        esac\n    fi\ndone\nprintf \"\\n\"\n"
  },
  {
    "path": "bin/python/usv-to-csv.py",
    "content": "#!/usr/bin/env python3\n\n# USV example shell script that converts USV to CSV.\n#\n# Note this script is a simple demo, and does not attempt to escape CSV output,\n# such as create a double-quoted unit to protect an embedded comma or newline.\n\nimport io\nimport sys\nsys.stdin.reconfigure(encoding='utf-8')\nsys.stdout.reconfigure(encoding='utf-8')\nescape = False\ncomma = ''\n\nwhile True:\n    c = sys.stdin.read(1)\n    if c == '':\n        break\n    if escape:\n        escape = False\n        print(f\"{c}\", end='', flush=True)\n    else:\n        match c:\n            case \"\\u001B\" | \"␛\":\n                escape = True\n            case \"\\u001F\" | \"␟\":\n                comma=','\n            case \"\\u001E\" | \"␞\":\n                print(f\"\\n\", end='', flush=True)\n                comma = ''\n            case \"\\u001D\" | \"␝\":\n                raise Exception(\"error: group separator\")\n            case \"\\u001C\" | \"␜\":\n                raise Exception(\"error: file separator\")\n            case \"\\u0004\" | \"␄\":\n                break\n            case (c):\n                print(f\"{comma}{c}\", end='', flush=True)\n                comma = ''\n"
  },
  {
    "path": "bin/python/usv-to-debug.py",
    "content": "#!/usr/bin/env python3\n\n# USV example script that demonstrates the use of USV characters.\n# This script reads STDIN one character at a time, and prints text.\n\nimport io\nimport sys\nsys.stdin.reconfigure(encoding='utf-8')\nsys.stdout.reconfigure(encoding='utf-8')\nescape = False\n\nwhile True:\n    c = sys.stdin.read(1)\n    if c == '':\n        break\n    if escape:\n        escape = False\n        print(f\"\\nescape character: {c}\\n\", end='', flush=True)\n    else:\n        match c:\n            case \"\\u001B\" | \"␛\":\n                print(\"\\nescape\\n\", end='', flush=True)\n                escape = True\n            case \"\\u001F\" | \"␟\":\n                print(f\"\\nunit separator\\n\", end='', flush=True)\n            case \"\\u001E\" | \"␞\":\n                print(f\"\\nrecord separator\\n\", end='', flush=True)\n            case \"\\u001D\" | \"␝\":\n                print(f\"\\ngroup separator\\n\", end='', flush=True)\n            case \"\\u001C\" | \"␜\":\n                print(f\"\\nfile separator\\n\", end='', flush=True)\n            case \"\\u0004\" | \"␄\":\n                print(f\"\\nend of transmission\\n\", end='', flush=True)\n                break\n            case (c):\n                print(f\"{c}\", end='', flush=True)\nprint()\n"
  },
  {
    "path": "bin/python/usv-to-display.py",
    "content": "#!/usr/bin/env python3\n\n# USV example script that demonstrates the use of USV characters.\n# This script reads STDIN one character at a time, and prints text.\n\nimport io\nimport sys\nsys.stdin.reconfigure(encoding='utf-8')\nsys.stdout.reconfigure(encoding='utf-8')\nescape = False\n\nwhile True:\n    c = sys.stdin.read(1)\n    if c == '':\n        break\n    if escape:\n        escape = False\n        print(f\"{c}\", end='', flush=True)\n    else:\n        match c:\n            case \"\\u001B\" | \"␛\":\n                escape = True\n            case \"\\u001F\" | \"␟\":\n                print(f\",\", end='', flush=True)\n            case \"\\u001E\" | \"␞\":\n                print(f\"\\n\", end='', flush=True)\n            case \"\\u001D\" | \"␝\":\n                print(f\"\\n-\\n\", end='', flush=True)\n            case \"\\u001C\" | \"␜\":\n                print(f\"\\n=\\n\", end='', flush=True)\n            case \"\\u0004\" | \"␄\":\n                break\n            case (c):\n                print(f\"{c}\", end='', flush=True)\nprint()\n"
  },
  {
    "path": "doc/abnf/index.md",
    "content": "# Augmented Backus–Naur Form (ABNF)\n\nAugmented Backus–Naur Form (ABNF) grammar-- work in progress.\n\n\n## Semantics\n\n* usv = *files\n\n* file = *groups\n\n* group = *records\n\n* record = *units\n\n* unit = *content-characters\n\n\n## Syntax\n\nSections:\n\n* usv = ( header-and-body / body ) '*' ; anything after the body is chaff\n\n* header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-run\n\n* body = *unit-run / *record-run / *group-run / *file-run\n\nRuns:\n\n* file-run = *( *spacer-character file *spacer-character FS )\n\n* group-run = *( *spacer-character group *spacer-character GS )\n\n* record-run = *( *spacer-character record *spacer-character RS )\n\n* unit-run = *( *spacer-character unit *spacer-character US )\n\nCharacter classes:\n\n* content-character = typical-character / escape-character\n\n* typical-character = '*' - special-character - escape-character\n\n* special-character = US / RS / GS / FS / ESC / EOT\n\n* escape-character = ESC ( special-character / typical-character )\n\n* spacer-character = Defined by Unicode Derived Core Property White_Space\n\n\n## Unicode characters\n\nMarkers:\n\n* US = U+001F Unit Separator / U+241F Symbol for Unit Separator\n\n* RS = U+001E Record Separator / U+241E Symbol for Record Separator\n\n* GS = U+001D Group Separator / U+241D Symbol for Group Separator\n\n* FS = U+001C File Separator / U+241C Symbol for File Separator\n\nModifiers:\n\n* ESC = U+001B Escape / U+241B Symbol for Escape\n\n* EOT = U+0004 End Of Transmission / U+2404 Symbol for End Of Transmission\n"
  },
  {
    "path": "doc/clap/index.md",
    "content": "# Command line argument parsing (CLAP)\n\nUSV tools should enable users to choose their preferred output style.\n\nUSV tools for terminals should enable options with these settings.\n\nOptions for USV separators and modifiers:\n\n* -u, --unit-separator : Set the unit separator string.\n\n* -r, --record-separator : Set the record separator string.\n\n* -g, --group-separator : Set the group separator string.\n\n* -f, --file-separator : Set the file separator string.\n\n* -e, --escape : Set the escape string.\n\n* -z, --end-of-transmission : Set the end-of-transmission string.\n\nOptions for USV marks:\n\n* --style-symbols : Show marks as symbols, such as \"␟\" for Unit Separator.\n\n* --style-controls : Show marks as controls, such as \"\\u001F\" for Unit Separator. This is most like ASCII Separated Values (ASV).\n\n* --style-braces : Show marks as braces, such as \"{US}\" for Unit Separator. This is to help plain text readers, and is not USV output.\n\nOptions for USV layout:\n\n* --layout-0: Show each item with no line around it. This is no layout, in other words one long line.\n\n* --layout-1: Show each item with one line around it. This is like single-space lines for long form text.\n\n* --layout-2: Show each item with two lines around it. This is like double-space lines for long form text.\n\n* --layout-units: Show each unit on one line. This can be helpful for line-oriented tools.\n\n* --layout-records: Show each record on one line. This is like a typical spreadsheet sheet export.\n\n* --layout-groups: Show each group on one line. This can be helpful for folio-oriented tools.\n\n* --layout-files: Show one file on one line. This can be helpful for archive-oriented tools.\n\nOptions for command line tools:\n\n* -h, --help : Print help\n\n* -V, --version : Print version\n\n* -v, --verbose... : Set the verbosity level: 0=none, 1=error, 2=warn, 3=info, 4=debug, 5=trace. Example: --verbose …\n\n* --test : Print test output for debugging, verifying, tracing, and the like. Example: --test\n"
  },
  {
    "path": "doc/code/index.md",
    "content": "# Code\n\nUSV has source code examples and also has production-ready library code.\n\n\n## Script examples with Bash and python\n\nThis repository includes USV code examples that demonstrate parsing.\n\nBash examples:\n\n* [usv-to-display.bash](../../bin/bash/usv-to-display.bash)\n\n* [usv-to-debug.bash](../../bin/bash/usv-to-debug.bash)\n\n* [usv-to-csv.bash](../../bin/bash/usv-to-csv.bash)\n\nPython examples:\n\n* [usv-to-display.py](../../bin/python/usv-to-display.py)\n\n* [usv-to-debug.py](../../bin/python/usv-to-debug.py)\n\n* [usv-to-csv.py](../../bin/python/usv-to-csv.py)\n\n\n## Production code with Rust\n\nRust has a crate in its own repo suitable for production use:\n\n* `cargo install usv`\n\n* [https://crates.io/crate/usv](https://crates.io/crate/usv)\n\n* [https://github.com/sixarm/usv-rust-crate](https://github.com/sixarm/usv-rust-crate)\n\nCommand line converters:\n\n* [asv-to-usv](https://crates.io/crate/asv-to-usv) and [usv-to-asv](https://crates.io/crate/usv-to-asv)\n\n* [csv-to-usv](https://crates.io/crate/csv-to-usv) and [usv-to-csv](https://crates.io/crate/usv-to-csv)\n\n* [json-to-usv](https://crates.io/crate/json-to-usv) and [usv-to-json](https://crates.io/crate/usv-to-json)\n\nThe Rust code includes tests and benchmarks. We welcome improvements.\n"
  },
  {
    "path": "doc/comparisons/asv/index.md",
    "content": "# ASCII Separated Values (ASV) a.k.a. DEL (Delimited ASCII)\n\nASCII Separated Values (ASV) uses these invisible zero-width control character separators:\n\n* ASCII character 28 as file separator\n\n* ASCII character 29 as group separator\n\n* ASCII character 30 as record separator\n\n* ASCII character 31 as unit separator.\n\nThese separators are identical in concept as in USV.\n\nASV also:\n\n* Forbids the ASCII control characters in content. In other words, there is no escaping.\n\n* In practice, has many incompatible implementations and users that expect the record separator to be a newline character, because the implementations and users prefer to display the data on a screen.\n\n\n## In our experience\n\nIn our experience, these ASCII characters tend to be hard to edit manually.\n\n* Because many editors treat the characters as invisible zero-width characters.\n\n* Because major character pickers show the visible character then insert the visible character, which is the corresponding USV Symbol.\n\nIn our experience, > 90% of the ASV files we discovered in our research used the character \"\\n\" as the record delimiter, or the combination of characters \"\\r\\n\", rather than the correct character 30.\n"
  },
  {
    "path": "doc/comparisons/csv/index.md",
    "content": "# Comma Separated Values (CSV)\n\nComma Separated Values (CSV) uses a comma character to separate values, and a newline character to separate records.\n\n* Has fields, which are equivalent to USV units.\n\n* Has records, which are equivalent to USV records.\n\n* Does not have a greater hierarchy, such as USV groups and fields, or spreadsheet sheets and folios, or database tables or schemas, etc.\n\n* Forbids the tab character in content.\n\n* Forbids the newline character in content.\n\n* Some implementations forbid the comma character in content; other implementations allow it if and only if the field is surrounded by quotation marks.\n\n* Some implementations forbid the newline character in content; other implementations allow it if and only if the field is surrounded by quotation marks.\n\n\n## Custom delimiter character\n\nSome CSV implementations and users enable a custom delimiter character.\n\n* For example, some users prefer to use the semicolon character. This is prevalent among some European regions, where the comma character is frequently in use within numbers as a digit separator, such as \"123,456,789\".\n\n* For example, some users prefer to use the vertical pipe character. This is prevalent among some developers of natural language content, when the developers are aware that content may contain commas or semicolons, yet is unlikely to contain a pipe character.\n\nThere is no standardization to know what the delimiter character is, ahead of time.\n\n* In practice, some CSV implementations use a heuristic to guess the delimiter character by inspecting the data.\n\n* In practice, some CSV users send along out-of-band instructions that explain the delimiter character.\n\n\n### Commas\n\nCSV implementations may fail when there is a comma that is supposed to be in content, or may require quoting:\n\nThis data is typically parsed as two CSV fields:\n\n```csv\nhello, world\n```\n\nTo get the data as one field, some CSV implementations support surrounding quotation marks:\n\n```csv\n\"hello, world\"\n```\n\nUSV honors commas, such as in this one unit that contains a comma:\n\n```usv\nhello, world\n```\n\n\n### Quotes\n\nCSV implementations may fail when there is a quotation mark that is supposed to be in content, or may require implementation-specific triple double-quotes.\n\nThis data is typically parsed as a CSV error:\n\n```csv\nI say \"hello, world\"\n```\n\nTo get the data as one field, some CSV implementations support surrounding quotation marks and escaping via double double-quotes:\n\n```csv\n\"I say \"\"hello, world\"\"\"\n```\n\nUSV honors quotes, such as in this one unit that contains quotation marks:\n\n```usv\nI say \"hello, world\"\n```\n\n\n### Newlines\n\nCSV implementations may fail when there is a newline that is supposed to be in content, or may require implementation-specific escaping.\n\nThis data is typically parsed as a CSV error:\n\n```csv\n\"first line\\nsecond line\"\n```\n\nTo get the data as one field, some CSV implementations support escaping by using backslash quotation marks like this:\n\n```csv\n\"\\\"first line\\rsecond line\\\"\"\n```\n\nUSV honors newlines, such as in this one unit that contains a newline:\n\n```usv\nfirst line\nsecond line\n```\n\n\n## In our experience\n\nIn our experience, the CSV format has various kinds of implementations, some incompatible, some with escaping and some without.\n\nIn our experience, some software programs use the file name extension \".csv\" to mean other ways of separating data with other characters, such as using tabs, or semi-colons, or spaces.\n\n\n### CSV files\n\nWe work with spreadsheets that are folios, that each contain sheets, that each contain grids.\n\nSuppose we work with 3 spreadsheets, and each spreadsheet contains 3 sheets. When we export the data, the export process needs multiple filesystem files, and needs some kind of ad hoc naming convention to show what's what:\n\n```txt\nmy-folio-1-sheet-1.csv\nmy-folio-1-sheet-2.csv\nmy-folio-1-sheet-3.csv\nmy-folio-2-sheet-1.csv\nmy-folio-2-sheet-2.csv\nmy-folio-2-sheet-3.csv\nmy-folio-3-sheet-1.csv\nmy-folio-3-sheet-2.csv\nmy-folio-3-sheet-3.csv\n```\n\nTo send all the data to another team, we have tried a variety of combiner tools, such as `tar` and `zip`.\n\nFor comparison, USV can contain all the data, because a USV file is equivalent to a spreadsheet folio, and USV group is equivalent to a spreadsheet sheet.\n\nThus our export uses one filesystem file:\n\n```txt\nmy.usv\n```\n"
  },
  {
    "path": "doc/comparisons/index.md",
    "content": "# Comparisons with ASV, CSV, TSV, RSV\n\nUnicode separated values (USV) is similar to these formats, plus offers more capabilities, editor-friendly markup, and standards-track syntax.\n\n* [ASCII separated values (ASV) a.k.a. DEL (Delimited ASCII)](asv)\n\n* [Comma Separated Values (CSV)](csv)\n\n* [Tab Separated Values (TSV) a.k.a. Tab Delimited Format (TDF)](tsv)\n\n* [Rows of String Values (RSV)](rsv)\n\n* [JavaScript Object Notation (JSON)](json)\n\n* [Microsoft Excel (XLSX)](xlsx)\n\n\n## Summary table\n\n| Capability                  | [USV](../../) | [ASV](asv) | [CSV](csv) | [TSV](tsv) | [RSV](rsv) | [JSON](json) | [XLSX](xlsx) |\n| ---                         | --- | --- | --- | --- | --- | --- | --- |\n| Units / cells / fields      | ✅ | ✅ | ✅ | ✅ | ✅ | 🟡 | ✅ |\n| Records / lines / rows      | ✅ | ✅ | ✅ | ✅ | ✅ | 🟡 | ✅ |\n| Groups / sheets / tables    | ✅ | ✅ | ⛔ | ⛔ | ⛔ | 🟡 | ✅ |\n| Files / folios / schemas    | ✅ | ✅ | ⛔ | ⛔ | ⛔ | 🟡 | ✅ |\n| Text, not binary            | ✅ | ✅ | ✅ | ✅ | ⛔ | ✅ | ⛔ |\n| All visible separators      | ✅ | ⛔ | ✅ | 🟡 | ⛔ | ✅ | ⛔ |\n| Easy for any text editor    | ✅ | ⛔ | ✅ | ✅ | ⛔ | ⛔ | ⛔ |\n| Separator line spacing      | ✅ | ⛔ | 🟡 | 🟡 | ⛔ | 🟡 | ⛔ |\n| IETF.org standards-track    | ✅ | ⛔ | 🟡 | 🟡 | ⛔ | ✅ | 🟡 |\n| Escaping                    | ✅ | ✅ | ✅ | ⛔ | ⛔ | 🟡 | 🟡 |\n| End of Transmission         | ✅ | ✅ | ⛔ | ⛔ | ⛔ | ⛔ | ⛔ |\n| Variable units per record   | ✅ | ⛔ | ⛔ | ⛔ | ✅ | ✅ | ⛔ |\n| Separators are terminators  | ✅ | ⛔ | ⛔ | ⛔ | ✅ | ⛔ | ⛔ |\n| Unicode UTF-8 default       | ✅ | ⛔ | ⛔ | ⛔ | ⛔ | ✅ | 🟡 |\n\n\n## Example for ASCII Separated Values (ASV)\n\n```asv\na\\u001FB\\u001F\\u001Ec\\u001FD\\u001F\\u001E\n```\n\nUSV with symbols:\n\n```usv\na␟b␟␞c␟d␟␞\n```\n\nUSV with controls is identical to ASV:\n\n```usv\na\\u001FB\\u001F\\u001Ec\\u001FD\\u001F\\u001E\n```\n\n\n## Example for Comma Separated Values (CSV)\n\nCSV example:\n\n```xlsx\na,b\nc,d\n```\n\nUSV with symbols:\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\nUSV with controls:\n\n```usv\na\\u001FB\\u001F\\u001E\nc\\u001FD\\u001F\\u001E\n```\n\n\n## Example for Tab Separated Values (TSV)\n\nTSV example:\n\n```xlsx\na       b\nc       d\n```\n\nUSV with symbols:\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\nUSV with controls:\n\n```usv\na\\u001FB\\u001F\\u001E\nc\\u001FD\\u001F\\u001E\n```\n\n\n## Example for Rows of String Values (RSV)\n\nRSV example:\n\n```rsv\na\\b255b\\b255\\b253c\\b255d\\b255\\b253\n```\n\nUSV with symbols:\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\nUSV with controls:\n\n```usv\na\\u001FB\\u001F\\u001E\nc\\u001FD\\u001F\\u001E\n```\n\n\n## Example for Microsoft Excel (XLSX)\n\nXLSX example:\n\n```xlsx\nSheet 1\na,b\nc,d\n\nSheet 2\nd,e\nf,g\n```\n\nUSV with symbols:\n\n```usv\nSheet 1␟␞\na␟b␟␞\nc␟d␟␞\n␝\nSheet 2␟␞\ne␟f␟␞\ng␟h␟␞\n␝\n```\n\nUSV with controls:\n\n```usv\nSheet 1\\u001F\\u001E\na\\u001FB\\u001F\\u001E\nc\\u001FD\\u001F\\u001E\n\\u001D\nSheet 2\\u001F\\u001E\ne\\u001Ff\\u001F\\u001E\ng\\u001Fh\\u001F\\u001E\n\\u001D\n```\n"
  },
  {
    "path": "doc/comparisons/json/index.md",
    "content": "# JavaScript Object Notation (JSON)\n\nJavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers. - Wikipedia ([Source](https://en.wikipedia.org/wiki/JSON))\n\nJSON is more flexible and more powerful than USV because JSON can have infinite nesting and also data types.\n\nExample JSON:\n\n```json\n[\n    [\"a\",\"b\"],\n    [\"d\",\"e\"]\n]\n```\n\nEquivalent USV:\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\n## In our experience\n\nWe use JSON in many web applications, API endpoints, data transformations, and the like. It works very well for these purposes.\n\nIn our experience JSON is harder to edit by hand than USV, and harder to teach to novices who want to view and edit data. USV tends to be easier for these use cases because USV is simpler.\n"
  },
  {
    "path": "doc/comparisons/rsv/index.md",
    "content": "# Rows of String Values (RSV)\n\nhttps://github.com/Stenway/RSV-Specification\n\nThe RSV data file format is a simple binary alternative to CSV.\n\nAn RSV document represents an array of arrays of nullable string values, also called a jagged array.\n\nIt's main purpose is to store tabular data. But because it's a jagged array, it's not limited to that. So, rows can contain the same number of values, but don't have to.\n"
  },
  {
    "path": "doc/comparisons/tsv/index.md",
    "content": "# Tab Separated Values (TSV) a.k.a. Tab Delimited Format (TDF)\n\nTab Separated Values (TSV) uses a tab character to separate values, and a newline character to separate records.\n\n* Has fields, which are equivalent to USV units.\n\n* Has records, which are equivalent to USV records.\n\n* Does not have a greater hierarchy, such as USV groups and fields, or spreadsheet sheets and folios, or database tables or schemas, etc.\n\n* Forbids the tab character in content.\n\n* Forbids the newline character in content.\n\n## In our experience\n\nIn our experience, TSV can be difficult to edit with some editors, because the tab character can be invisible, or can take up a varying number of character widths such as 2 spaces or 4 spaces or 8 spaces or as many spaces as it takes to get to the next tab stop.\n\nIn our experience, some software programs use the file name extension \".tsv\", others use the extension \".tdf\", and others use the extension \".csv\" even though the file actually uses tabs and doesn't use commas.\n"
  },
  {
    "path": "doc/comparisons/xlsx/index.md",
    "content": "# Microsoft Excel (XLSX)\n\nMicrosoft Excel (XLSX) is among the world's most popular spreadsheet programs. It uses a data format called \"XLSX\" which in turn uses XML and binary compression.\n\n* Has spreadsheet sheets. Each sheet is called a \"Worksheet\", and can contain columns and rows.\n\n* Has spreadsheet folios. Each folio is called a \"Workbook\", and can contain one or more sheets.\n\n* Does not have a greater hierarchy, such as a collection of folios.\n\n* Can import/export data in many formats, such as CSV and TSV, but not yet USV.\n\n\n## Custom delimiters\n\nMicrosoft Excel enables the user to import/export using a wide range of custom delimiters, such as column separators and row separators.\n\n\n## In our experience\n\nIn our experience, the XLSX is great for primarily reading and editing by using Microsoft Excel or a compatible spreadsheet program. We had some success using decompression software then a XML editor, but this process and the XML tooling is harder for end users to do.\n\n\n### Workbooks and Worksheets\n\nWe work with spreadsheets that are folios a.k.a. workbooks, that each contain multiple sheets a.k.a. worksheets.\n\n```txt\nmy-workbook-1.xlsx\nmy-workbook-2.xlsx\nmy-workbook-3.xlsx\n```\n\nOr if we export data to CSV or similar format then we have even more files:\n\n```txt\nmy-workbook-1-worksheet-1.csv\nmy-workbook-1-worksheet-2.csv\nmy-workbook-1-worksheet-3.csv\nmy-workbook-2-worksheet-1.csv\nmy-workbook-2-worksheet-2.csv\nmy-workbook-2-worksheet-3.csv\nmy-workbook-3-worksheet-1.csv\nmy-workbook-3-worksheet-2.csv\nmy-workbook-3-worksheet-3.csv\n```\n\nTo send all the data to another team, we have tried a variety of combiner tools, such as `tar` and `zip`.\n\nFor comparison, USV can contain all the data, because a USV file is equivalent to a spreadsheet folio, and USV group is equivalent to a spreadsheet sheet.\n\nThus our export uses one filesystem file:\n\n```txt\nmy.usv\n```\n"
  },
  {
    "path": "doc/converters/index.md",
    "content": "# Converters for ASV, CSV, JSON, XSLX\n\nASCII Separated Values (ASV):\n\n* [asv-to-usv](https://crates.io/crate/asv-to-usv)\n* [usv-to-asv](https://crates.io/crate/usv-to-asv)\n\nComma Separated Values (CSV):\n\n* [csv-to-usv](https://crates.io/crate/csv-to-usv)\n* [usv-to-csv](https://crates.io/crate/usv-to-csv)\n\nJavaScript Object Notation (JSON):\n\n* [json-to-usv](https://crates.io/crate/json-to-usv)\n* [usv-to-json](https://crates.io/crate/usv-to-json)\n\nMicrosoft Excel XML (XLSX):\n\n* [xlsx-to-usv](https://crates.io/crate/xlsx-to-usv)\n* [usv-to-xlsx](https://crates.io/crate/usv-to-xlsx)\n\n"
  },
  {
    "path": "doc/criticisms/index.md",
    "content": "# Criticisms\n\nUSV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).\n\nConstructive feedback is welcome. See also [frequently asked questions](../faq/).\n\n- [XKCD one universal standard](#xkcd-one-universal-standard)\n- [Fundamentally wrong](#fundamentally-wrong)\n- [You cannot edit it](#you-cannot-edit-it)\n- [No efficient storage](#no-efficient-storage)\n- [There is no wide library support](#there-is-no-wide-library-support)\n- [Not all data is representable](#not-all-data-is-representable)\n- [Editors work with invisible characters](#editors-work-with-invisible-characters)\n- [Doesn't work with Excel](#doesnt-work-with-excel)\n- [Not trivially splittable](#not-trivially-splittable)\n- [No need for an escape character](#no-need-for-an-escape-character)\n- [Can't encode as a single byte](#cant-encode-as-a-single-byte)\n- [Better off advocating for editor support](#better-off-advocating-for-editor-support)\n- [Cleverness for cleverness’s sake](#cleverness-for-clevernesss-sake)\n- [This is kinda stupid](#this-is-kinda-stupid)\n- [Nobody needs USV, and nobody should use it.](#nobody-needs-usv-and-nobody-should-use-it)\n- [Kill it with fire](#kill-it-with-fire)\n\n\n## XKCD one universal standard\n\n<blockquote>\n\"This is like the <a href=https://xkcd.com/927/>XKCD cartoon</a> about one universal standard.\"\n</blockquote>\n\nHa! That's funny. It turns out USV isn't trying to be one universal standard. CSV works really well for many use cases, and is well-supported everywhere, so by all means keep using CSV where you want and where it works well.\n\nUSV aims just for use cases that CSV doesn't seem to handle well, such as text that contains paragraphs of natural language, or displays better with newlines between units, or data that involves spreadsheet collections (e.g. folios comprising sheets comprising rows and columns) and database collections (e.g. schemas comprising tables comprising records and fields), or data that needs an End of Transmission.\n\n\n## Fundamentally wrong\n\n<blockquote>\n\"Using Unicode graphic characters as metasyntactic escape characters is fundamentally wrong. Those Unicode characters are for displaying the symbols for Unit Separator, Record Separator, etc. and not for actually being separators! ASCII already has those! Included in Unicode!\"\n</blockquote>\n\nUSV accepts ASCII control characters and the corresponding Unicode symbol characters as equivalent.\n\nIf you prefer to use exclusively ASCII control characters, then do that. I tried that approach first, and the ASCII control characters didn't work well in practice for visual display and for text editors. This is because the ASCII control characters are rendered as invisible for many of the displays and editors I tried, and also didn't copy correctly in many of the tools.\n\nAlso, there are command-line tools for converting from ASCII Separated Values (ASV) to Unicode Separated Values (USV) and vice versa: [asv-to-usv](https://crates.io/crates/asv-to-usv), [usv-to-asv](https://crates.io/crates/asv-to-usv).\n\n\n## You cannot edit it\n\n<blockquote>\n\"You cannot edit it in regular editor, like csv/tsv/jsonlines.\"\n</blockquote>\n\nI edit it in regular editors, every day. I use vi, emacs, VS Code, JEOTrains IDEs, and more. I've also tried USV on many more editors, and so far it works 100% of the time. If you have a specific editor that doesn't seem to be working well with USV, can you please contact me?\n\n\n## No efficient storage\n\n<blockquote>\n\"There is no efficient storage, like binary formats.\"\n</blockquote>\n\nUSV is a text format, on purpose, because it's aiming to be human-readable and human-editable. USV storage goals are similar in magnitude to CSV.\n\nIf you want efficient storage like a binary format, one way is to use compression on the text data. USV, CSV. and similar text formats can work well with compression, especially if the content has compression-friendly aspects such as repetitions, sequences, patterns, and so forth.\n\n\n## There is no wide library support\n\n<blockquote>\n\"There is no wide library support.\"\n</blockquote>\n\nCurrently there's library support using the [USV Rust crate](https://crates.io/crates/usv) and there are command line [converters](../converters/).\n\nI welcome help creating library support from anyone who wants to help. The Rust crate is relatively easy to understand, and should be portable to similar family languages such as C, C++, C#, Java, JavaScript, Python, Ruby, etc.\n\n\n## Not all data is representable\n\n<blockquote>\n\"Not all data is representable.\"\n</blockquote>\n\nCan you provide an example of data that is not representable, or an explanation of what the data could be?\n\nUSV aims for all data to be representable. Specifically, USV aims to be able to represent all UTF-8 encoded text. USV provides an escape character, so you can escape any of the USV special characters as you wish.\n\n\n## Editors work with invisible characters\n\n<blockquote>\n\"We already have editors that can work with invisible characters. It’s not hard.\"\n</blockquote>\n\nIt turns out it is hard, in practice. I tried using invisible characters first, and found ongoing hard problems such as with copy/paste, search/replace, import/export, pattern matching, font display, and zero-width rendering.\n\nIn fact, the difficulties with invisible characters seems to be the reason the reason that programmers mostly abandoned ASCII Separated Values (ASV) in favor of Comma Separated Values (CSV). USV aims to build on ASV to add capabilities for visible characters and better visible displays.\n\n\n## Doesn't work with Excel\n\n<blockquote>\n\"The adoptability challenge remains here to be Excel support.\"\n</blockquote>\n\nYes you're right. USV is brand-new on the standards track in 2024. Excel support is a long-term goal. Submitting to the IETF is to help programs like Excel to start supporting it.\n\nIf you have experience with writing Excel import/export capabilities, I welcome your help.\n\n\n## Not trivially splittable\n\n<blockquote>\n\"This format is not trivially splittable with a regular expression. I'd avoid most of the escaping they show, especially for line endings, and just make RS '\\n' the record separator, or possibly RS '\\n'*.\"\n</blockquote>\n\nSee the documentation about [how to use split and regex](../how-to-use-split-and-regex/).\n\nBroadly speaking, USV does not have a goal to be trivially splittable, because visual editing is much more important in practice, and because library parsing is more more reliable.\n\nASCII Separated Values (ASV) should be trivially splittable by using a unit separator byte character and record separator byte character. But it turns out that many ASV files in the wild actually change from using the record separator byte character to a newline character. Before you split, you need to know these choices.\n\nComma Separated Values (CSV) should be trivially splittable by using a comma byte character and newline byte character. But it turns out that many CSV files in the wild actually change from using the comma byte character to a semicolon byte character or a pipe character. And some CSV files use escaping such as for quotes, or commas that are embedded in content, or escaped newlines that are embedded in content. Before you split, you need to know these choices. It's easy if you handle all data yourself; it's not easy if you're working with many worldwide organizations.\n\n\n## No need for an escape character\n\n<blockquote>\n\"I am not convinced about the need for an escape character.\"\n</blockquote>\n\nI tried USV without an escape character for a year to get real-world feedback. The feedback was that the escape was needed, because otherwise there could be data that couldn't be represented without an extra out-of-band reformatting/rewriting step.\n\n\n## Can't encode as a single byte\n\n<blockquote>\n\"ASCII Separated Values is better because it can encode each separator as a single byte.\"\n</blockquote>\n\nIf single byte encoding is very important, and you don't care about visible symbols, then yes ASCII Separated Values (ASV) is better for you. USV doesn't have a goal of single byte separators.\n\nYou can freely convert between ASV and USV and back again, if you like, by using these [converters](../converters/)\n\n\n## Better off advocating for editor support\n\n<blockquote>\n\"Just because a glyph is \"invisible\" doesn't mean it has to actually be invisible. The symbols for the separators are hard to read, like you're pointing out, which means someone would eventually replace them with some other graphical display, in which case you were just as well off with the actual separators themselves. They would have been better off advocating for editor support for actual separator display.\"\n</blockquote>\n\nYes you're correct. Programmers have been advocating for editor support for actual separator display since the 1980's ASCII Separated Values.\n\nSo far, the advocating has not succeeded. USV is a compromise for the present.\n\nIf the future offers editor support as you describe, then it will be great to use that instead of USV, and in fact USV will have been very useful for getting people using group separators, file separators, escapes, End of Transmissions, and other ASV features that are more extensive than CSV.\n\n\n## Cleverness for cleverness’s sake\n\n<blockquote>\n\"USV would have the disadvantage of using multi-byte characters as delimiters, so you have to decode the file in order to separate records. And you still can’t type the characters directly or be guaranteed to display them without font support. This honestly seems like cleverness for cleverness’s sake.\"\n</blockquote>\n\nYes you're correct directionally on your technical points. To decode one record, you have to read that one record until you reach its record separator; in other words, you can't just use split on one byte value as you can with CSV. That said, you can decode one unit at a time, or one record at a time, or one group at a time, or one file at a time; you don't have to decode the whole file.\n\nAs for cleverness, it's not especially clever. USV is essentially just ASCII Separated Values (ASV) plus visible symbols and some simple extras for escape, end of transmission, and spacers. The core ideas of ASV and USV are all from the 1970's.\n\n\n## This is kinda stupid\n\n<blockquote>\n\"I've long wanted a successor to CSV, but this is kinda stupid. People like CSVs because they look good, feel natural even in plaintext. This is the same reason that Markdown in successful. As for including commas in your data, it could just have been managed with a simple escape character like a \\, for when there's actually a comma in your data. That's it.\"\n</blockquote>\n\nIf you want a successor to CSV, do you have suggestions for what you want?\n\nWhat I learned is that when you escape with a backslash, then you have to also provide for escaping a backslash, such as two in a row, and then it causes issues for use cases such as Windows paths, regular expressions, backslash as used in a typical backslash-t for tab or backslash-n for newline, and so on. This is why I prefer to use the escape character as U+241B Symbol for Escape (ESC).\n\nMore broadly, CSV handles units and records (such as one spreadsheet sheet), but not groups (such as multiple spreadsheet sheets) or files (such as multiple spreadsheet folios). USV handles all of these.\n\n\n## Nobody needs USV, and nobody should use it.\n\n<blockquote>\n\"This is needlessly adding yet another standard to the mix. If you are in a position to choose what standard you use, just use:\n\n* Whatever is best for the data model and/or languages you use. JSON is a common modern choice, suitable for most things.\n\n* If you want something more tabular, closer to CSV (which is a valid choice for bulk data), use strict RFC 4180 compliant data.\n\n* If you want to specify your own binary super-compact data, use ASN.1. I am also given to understand that Protobuf is a popular modern choice.\n\nIf you aren’t in a position to choose your standards, just do whatever you need to do to parse whatever junk you are given, and emit as standards-compliant data as possible as output.\n\n* Again, RFC 4180 is a great way to standardize your own CSV output, as long as you stick to a subset which the receiving party can parse.\n\nNobody needs USV, and nobody should use it.\"\n</blockquote>\n\nThanks for your specific feedback and conclusion. :-)\n\nFor me, what's best for my data model is text (not binary), that handles many human languages using UTF-8 (not ASCII), that is easy to read and edit in many text editors (not a specialized row-column editor), and that works especially well with content that is paragraphs of natural language with commas, quotes, newlines, indentations, and the like. I also want capabilities for groups (such as spreadsheet sheets) and files (such as spreadsheet folios).\n\nFor comparison I've tried binary formats (e.g. ASN.1, Protobuf), row-column tabular formats (e.g. CSV, TDF), web data formats (e.g. JSON, YAML), web markup formats (e.g. HTML, XML). For me, USV is significantly easier to use, read, edit, and share.\n\n\n## Kill it with fire\n\n<blockquote>\n\"Y'know, I greatly dislike this. It's an actual emotional reaction. This should not be standardized. No one should use this. This is a bad idea and deserves to die in obscurity.\n\nI'll tell you why, it's pretty simple. The characters this... thing is stealing, exist to represent invisible control sequences. That is their use. The fact that they can be mentioned by direct input is inevitable, but not to be encouraged.\n\nI will be greatly disappointed if this is accepted as a standard. The fact that a USV file looks like a rendered ASV file is a show stopping bug, an anti-feature, an insult to life itself. Kill it with fire.\"\n</blockquote>\n\nThat's great feedback! The previous time that I heard that kind of feedback, it was about emoji being terrible and how no one should use them. Luckily representations evolve. 😀\n"
  },
  {
    "path": "doc/editors/emacs/index.md",
    "content": "# Emacs notes\n\nC-x = shows a summary about the character at point.\n\nC-u C-x = shows details about the character at point.\n\nThe rest of this page is from the emacs manual:\n\nhttps://www.gnu.org/software/emacs/manual/html_node/emacs/International-Chars.html\n\n\n## 23.1 Introduction to International Character Sets\n\nThe users of international character sets and scripts have established many more-or-less standard coding systems for storing files. These coding systems are typically multibyte, meaning that sequences of two or more bytes are used to represent individual non-ASCII characters.\n\nInternally, Emacs uses its own multibyte character encoding, which is a superset of the Unicode standard. This internal encoding allows characters from almost every known script to be intermixed in a single buffer or string. Emacs translates between the multibyte character encoding and various other coding systems when reading and writing files, and when exchanging data with subprocesses.\n\nThe command C-h h (view-hello-file) displays the file etc/HELLO, which illustrates various scripts by showing how to say “hello” in many languages. If some characters can’t be displayed on your terminal, they appear as ‘?’ or as hollow boxes (see Undisplayable Characters).\n\nKeyboards, even in the countries where these character sets are used, generally don’t have keys for all the characters in them. You can insert characters that your keyboard does not support, using C-x 8 RET (insert-char). See Inserting Text. Shorthands are available for some common characters; for example, you can insert a left single quotation mark ‘ by typing C-x 8 [, or in Electric Quote mode, usually by simply typing `. See Quotation Marks. Emacs also supports various input methods, typically one for each script or language, which make it easier to type characters in the script. See Input Methods.\n\nThe prefix key C-x RET is used for commands that pertain to multibyte characters, coding systems, and input methods.\n\nThe command C-x = (what-cursor-position) shows information about the character at point. In addition to the character position, which was described in Cursor Position Information, this command displays how the character is encoded. For instance, it displays the following line in the echo area for the character ‘c’:\n\n```\nChar: c (99, #o143, #x63) point=28062 of 36168 (78%) column=53\n```\n\nThe four values after ‘Char:’ describe the character that follows point, first by showing it and then by giving its character code in decimal, octal and hex. For a non-ASCII multibyte character, these are followed by ‘file’ and the character’s representation, in hex, in the buffer’s coding system, if that coding system encodes the character safely and with a single byte (see Coding Systems). If the character’s encoding is longer than one byte, Emacs shows ‘file ...’.\n\nOn rare occasions, Emacs encounters raw bytes: single bytes whose values are in the range 128 (0200 octal) through 255 (0377 octal), which Emacs cannot interpret as part of a known encoding of some non-ASCII character. Such raw bytes are treated as if they belonged to a special character set eight-bit; Emacs displays them as escaped octal codes (this can be customized; see Customization of Display). In this case, C-x = shows ‘raw-byte’ instead of ‘file’. In addition, C-x = shows the character codes of raw bytes as if they were in the range #x3FFF80..#x3FFFFF, which is where Emacs maps them to distinguish them from Unicode characters in the range #x0080..#x00FF.\n\nWith a prefix argument (C-u C-x =), this command additionally calls the command describe-char, which displays a detailed description of the character:\n\n* *The character set name, and the codes that identify the character within that character set; ASCII characters are identified as belonging to the ascii character set.\n* The character’s script, syntax and categories.\n* What keys to type to input the character in the current input method (if it supports the character).\n* The character’s encodings, both internally in the buffer, and externally if you were to save the buffer to a file.\n* If you are running Emacs on a graphical display, the font name and glyph code for the character. If you are running Emacs on a text terminal, the code(s) sent to the terminal.\n* If the character was composed on display with any following characters to form one or more grapheme clusters, the composition information: the font glyphs if the frame is on a graphical display, and the characters that were composed.\n* The character’s text properties (see Text Properties in the Emacs Lisp Reference Manual), including any non-default faces used to display the character, and any overlays containing it (see Overlays in the same manual).\n\nHere’s an example, with some lines folded to fit into this manual:\n\n```\n             position: 1 of 1 (0%), column: 0\n            character: ê (displayed as ê) (codepoint 234, #o352, #xea)\n    preferred charset: unicode (Unicode (ISO10646))\ncode point in charset: 0xEA\n               script: latin\n               syntax: w        which means: word\n             category: .:Base, L:Left-to-right (strong), c:Chinese,\n                       j:Japanese, l:Latin, v:Viet\n             to input: type \"C-x 8 RET ea\" or\n                       \"C-x 8 RET LATIN SMALL LETTER E WITH CIRCUMFLEX\"\n          buffer code: #xC3 #xAA\n            file code: #xC3 #xAA (encoded by coding system utf-8-unix)\n              display: by this font (glyph code)\n    xft:-PfEd-DejaVu Sans Mono-normal-normal-\n        normal-*-15-*-*-*-m-0-iso10646-1 (#xAC)\n\nCharacter code properties: customize what to show\n  name: LATIN SMALL LETTER E WITH CIRCUMFLEX\n  old-name: LATIN SMALL LETTER E CIRCUMFLEX\n  general-category: Ll (Letter, Lowercase)\n  decomposition: (101 770) ('e' '^')\n```\n"
  },
  {
    "path": "doc/editors/vi/index.md",
    "content": "# vim notes\n\nvim comes with most modern Linux and BSD distributions.\n\n## Digraph characters\n\nTo add digraphs for each USV character, add\n\n```\ndigraph us 9247 rs 9246 gs 9245 fs 9244 es 9243 eo 9220\n```\n\nto your `~/.vimrc`\n\nThen when you want to type, for instance, the record separator character, in insert mode, type `<ctrl-k>rs`\n\n## List hidden characters\n\nTo list hidden characters:\n\n```\n:set list\n```\n\nLater:\n\n```\n:set nolist\n```\n"
  },
  {
    "path": "doc/end-of-transmission/index.md",
    "content": "# End of Transmission (EOT)\n\nThe End of Transmission (EOT) mark tells any reader that it can stop reading.\n\n* EOT tells the data reader that data is done.\n\n* EOT has no effect on the output content.\n\nExample of a unit \"abc\" then EOT then extra data \"xxx\" that is ignored.\n\n```usv\nabc␞␄xxx\n```\n\nEOT can be useful for a variety of use cases:\n\n* Streaming data, such as to signal that the reader can close a connection.\n\n* Appending data, such as USV content, then extra information such as comments.\n    \n* Attaching data, such as a USV spreadsheet that has MIME attachments.\n"
  },
  {
    "path": "doc/escape/index.md",
    "content": "# Escape (ESC)\n\nThe Escape (ESC) symbol makes the subsequent character treated as a content character.\n\nExample: USV with a unit that contains an Escape + End of Transmission, which is treated as content.\n\n```usv\na␛␄b␟\n```\n\nIn the rare case that you need a separator then content that starts with a carriage return or newline:\n\n* You escape the carriage return or newline.\n\n* This is because separators may be optionally be followed by any number of carriage returns and/or newlines, which is to help with visual display.\n"
  },
  {
    "path": "doc/faq/index.md",
    "content": "# Frequently Asked Questions\n\nUSV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).\n\nConstructive feedback is welcome. See also [criticisms](../criticisms/).\n\n- [Is USV easy?](#is-usv-easy)\n- [IS USV aiming to be a standard?](#is-usv-aiming-to-be-a-standard)\n- [Why choose USV over CSV or TSV?](#why-choose-usv-over-csv-or-tsv)\n- [Why choose USV over ASV?](#why-choose-usv-over-asv)\n- [Why choose USV over ASV for machine-only data?](#why-choose-usv-over-asv-for-machine-only-data)\n- [Why use control picture characters rather than the control characters themselves?](#why-use-control-picture-characters-rather-than-the-control-characters-themselves)\n- [Why are the symbols so small on my screen?](#why-are-the-symbols-so-small-on-my-screen)\n\n\n\n## Is USV easy?\n\nYes. If you know about comma separated values (CSV), or tab separated values\n(TSV), or ASCII separated values (ASV), or JavaScript Object Notation (JSON),\nthen you already know much about USV.\n\n\n## IS USV aiming to be a standard?\n\nYes, USV is aiming to become an IETF standard similar to <a\nhref=\"https://www.ietf.org/rfc/rfc4180.txt\">IETF RCF 4180 for CSV</a>.\nWe have submitted the IETF Internet Draft and it is a work in progress.\n\nYes, USV is aiming to become an IANA standard similar to <a\nhref=\"https://www.iana.org/assignments/media-types/text/tab-separated-values\">IANA\nTSV</a>. We have submitted the request for the \"text/usv\" media type.\n\n\n## Why choose USV over CSV or TSV?\n\nYou want your data content to be able to contain commas, or tabs, or newlines,\nwithout special escaping or different quoting rules than other data such as\nnumbers.\n\nYou want your data content to be able to use data groups, or database tables, or\nspreadsheet grids.\n\nYou want your data format to be able to use data files, or database schemas, or\nspreadsheet folios.\n\nYou want your data semantics to be able to use hierarchy levels, nesting, or\noutlines.\n\nYou want a consistent compatible standard format, which CSV can't always\nprovide.\n\nYou want a consistent compatible standardized file name extension, which\nCSV/TSV/TDF can't always provide.\n\nYou want to use End of Transmission (EOT), so you can guarantee a reader\nhas read data until the end.\n\n\n## Why choose USV over ASV?\n\nYou want your data content to be friendlier for human reading and human editing.\n\nUSV provides typically-visible letter-width characters (such as Unicode 241F),\nwhereas ASV provides typically-invisible zero-width characters (such as ASCII\n31).\n\nIt's true that some editors do render ASV characters using other visual\nrepresentations, such as using the corresponding USV visible characters;\nhowever in practice we haven't found much support for this approach.\n\n\n## Why choose USV over ASV for machine-only data?\n\nFor machine-only data, such as data that will never be used for human reading or\nhuman editing, then USV or ASV are similar because both can handle units,\nfields, groups, and files.\n\n\n## Why use control picture characters rather than the control characters themselves?\n\nWe tried using the control characters, and also tried configuring various editors to show the control characters by rendering the control picture characters.\n\nFirst, we encountered many difficulties with editor configurations, attempting to make each editor treat the invisible zero-width characters by rendering with the visible letter-width characters.\n\nSecond, we encountered problems with copy/paste functionality, where it often didn't work because the editor implementations and terminal implementations copied visible letter-width characters, not the underlying invisible zero-width characters.\n\nThird, users were unable to distinguish between the rendered control picture characters (e.g. the editor saw ASCII 31 and rendered Unicode Unit Separator) versus the control picture characters being in the data content (e.g. someone actually typed Unicode Unit Separator into the data content).\n\n\n## Why are the symbols so small on my screen?\n\nUSV renders on your system by using your local font. If your local font has small Unicode symbols for specific characters, then you'll see these. On many systems we've tried, the characters render with the letters \"US\", \"RS\", \"GS\", \"FS\", etc. We are open to suggestions for fonts that work especially with with USV, and we are open to funding the creation of specialized fonts for these specific characters.\n\n"
  },
  {
    "path": "doc/history-of-ascii-separated-values/index.md",
    "content": "# History of ASCII separated values (ASV)\n\n➤ <https://www.lammertbies.nl/comm/info/ascii-characters>\n\n\n## ASCII 28 = FS = File separator\n\nThe file separator FS is an interesting control code, as it gives us insight in the way that computer technology was organized in the sixties. We are now used to random access media like RAM and magnetic disks, but when the ASCII standard was defined, most data was serial. I am not only talking about serial communications, but also about serial storage like punch cards, paper tape and magnetic tapes. In such a situation it is clearly efficient to have a single control code to signal the separation of two files. The FS was defined for this purpose.\n\n\n## ASCII 29 = GS = Group separator\n\nData storage was one of the main reasons for some control codes to get in the ASCII definition. Databases are most of the time setup with tables, containing records. All records in one table have the same type, but records of different tables can be different. The group separator GS is defined to separate tables in a serial data storage system. Note that the word table wasn't used at that moment and the ASCII people called it a group.\n\n\n## ASCII 30 = RS = Record separator\n\nWithin a group (or table) the records are separated with RS or record separator.\n\n\n## ASCII 31 = US = Unit separator\n\nThe smallest data item to be stored in a database is called a unit in the ASCII definition. The unit separator separates these fields in a serial data storage environment. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times.\n\n\n## ASCII 14 = Shift Out & ASCII 15 = Shift In\n\nThe original purpose of these characters was to provide a way to shift a coloured ribbon, split longitudinally usually with red and black, up and down to the other color in an electro-mechanical typewriter or teleprinter, such as the Teletype Model 38, to automate the same function of manual typewriters. Black was the conventional ambient default color and so was shifted \"in\" or \"out\" with the other color on the ribbon.\n\n➤ <https://wikipedia.org/wiki/Shift_Out_and_Shift_In_characters>\n\n"
  },
  {
    "path": "doc/how-to-type-unicode-characters/index.md",
    "content": "# How to type Unicode characters\n\nOn many systems, you can type Unicode characters this way:\n\n1. Press and hold the Alt key a.k.a. Option key.\n\n2. Type + and the Unicode character hexadecimal code, such as +241f for Unit Separator.\n\n3. Release the Alt key a.k.a. Option key.\n\nOn Apple macOS, you may need to do a one-time setup:\n\n1. Go to System Preferences -> Keyboard -> Input Sources.\n\n2. Click on + button, select \"Others\" -> \"Unicode Hex Input\" and press \"Add\". (End of one-time)\n\n3. Switch to the Unicode Hex Input in the menu bar.\n\n4. Hold down the Option key and type the hexadecimal unicode value, then release the Option key.\n"
  },
  {
    "path": "doc/how-to-use-split-and-regex/index.md",
    "content": "# How to use split and regex\n\nTo use split and regex, rather than a specific USV parsing tool or library, then you have choices.\n\nThe pseudocode here is the current best approximation of USV using split and regex.\n\nIf you are certain that your data never uses any escape characters:\n\n```regex\ntransmission = split input on \"[\\u0004\\u2404]\" first\n\nfiles = split transmission on \"[\\u001C\\u241C]\"\n\ngroups = split file on \"[\\u001D\\u241D]\n\nrecords = split group on \"[\\u001E\\u241E]\"\n\nunits = split unit on \"[\\u001F\\u241F]\"\n\nunit = trim(unit)\n```\n\nIf your data may use any escape characters, and also if your split and regex offer capabilities for negative lookbehind:\n\n```regex\ntransmission = split input on \"[\\u0004\\u2404]\" first\n\nfiles = split transmission on \"(?<![\\u001B\\u241B])\\u001C\\u241C\"\n\ngroups = split file on \"(?<![\\u001B\\u241B])[\\u001D\\u241D]␝\"\n\nrecords = split group on \"(?<![\\u001B\\u241B])[\\u001E\\u241E]\"\n\nunits = split unit on \"(?<![\\u001B\\u241B])[\\u001F\\u241F]\"\n\nunit = trim(unit)\n```\n"
  },
  {
    "path": "doc/layout/index.md",
    "content": "# Layout\n\nUSV styles can customize various kinds of output so it looks like you prefer.\n\n* Layout 0: Show each item with no line around it. This is no layout, in other words one long line.\n\n* Layout 1: Show each item with one line around it. This is like single-space lines for long form text.\n\n* Layout 2: Show each item with two lines around it. This is like double-space lines for long form text.\n\n* Layout units: Show each unit on one line. This can be helpful for line-oriented tools.\n\n* Layout records: Show each record on one line. This is like a typical spreadsheet sheet export.\n\n* Layout groups: Show each group on one line. This can be helpful for folio-oriented tools.\n\n* Layout files: Show one file on one line. This can be helpful for archive-oriented tools.\n"
  },
  {
    "path": "doc/markup/index.md",
    "content": "# USV markup\n\nUSV uses Unicode characters for data markup.\n\n* <tt>[U+001F](https://codepoints.net/U+001F)/[U+241F](https://codepoints.net/U+241F)</tt> Unit Separator. For a spreadsheet cell, database field, etc.\n\n* <tt>[U+001E](https://codepoints.net/U+001E)/[U+241E](https://codepoints.net/U+241E)</tt> Record Separator. For a spreadsheet line, database row, etc.\n\n* <tt>[U+001D](https://codepoints.net/U+001D)/[U+241D](https://codepoints.net/U+241D)</tt> Group Separator. For a spreadsheet sheet, database table, etc.\n\n* <tt>[U+001C](https://codepoints.net/U+001C)/[U+241C](https://codepoints.net/U+241C)</tt> File Separator. For a spreadsheet folio, database schema, etc.\n\n* <tt>[U+001B](https://codepoints.net/U+001B)/[U+241B](https://codepoints.net/U+241B)</tt> Escape. For protecting markup characters in content.\n\n* <tt>[U+0004](https://codepoints.net/U+0004)/[U+2404](https://codepoints.net/U+2404)</tt> End of Transmission. For concluding parsing.\n\n\n## Character details\n\n* [Escape (ESC)](../escape/)\n\n* [End of Transmission (EOT)](../end-of-transmission/)\n\n* [Spacers](../spacers/)\n"
  },
  {
    "path": "doc/purpose/index.md",
    "content": "# USV purpose\n\nThe USV purpose is to help people edit data, share data, and manage data.\n\n* Edit data by using plain text and any typical text editor.\n\n* Share data by using an international standard for markup.\n\n* Manage data by in ways that work well with spreadsheets and databases.\n\n## Edit data by using plain text and any typical text editor\n\nUSV is a plain text format that aims to be easy to read and edit.\n\n* Because USV is plain text, you can use any text editor to open a USV file, edit it, save it, print it, and so on.\n\n* Because USV enables line spacing wherever you want it, you can edit anything from simple unit-oriented data (such as for logs and metrics) all the way up to complex file-oriented data (such as for blog posts and content management).\n\n* Because USV can display marks using your choice of visible symbol characters or invisible control characters, you can edit using your preferred editors and preferred settings for displaying Unicode symbols and Unicode controls.\n\n## Share data by using an international standard for markup\n\nUSV has a formal specification on-track to become an international standard.\n\n* Because USV is for worldwide sharing, there is a specification that sets the same marks (such as delimiters) for everyone.\n\n* Because USV provides a formal IETF Internet-Draft, anyone may implement USV in any language, and know that it will work.\n\n* Because USV has a reference implementation that is free libre open source software, everyone can share the tooling as well.\n\n## Manage data by in ways that work well with spreadsheets and databases\n\nUSV can manage data collections such as spreadsheet sheets and folios, and database tables and schemas.\n\n* Because USV has units, records, groups, files, and end of transmission, it has more dimensions than CSV, and can even allow for attachments.\n\n* Because USV has more dimensions, it can replace ad hoc binders, such as ZIP files comprising CSV sheets, or XML files comprising Excel workbooks.\n\n* Because USV has jagged array capabilities, it can help save and restore system disk paths, spreadsheet folio tabs, database table names, and more.\n"
  },
  {
    "path": "doc/rfc/draft-unicode-separated-values-01.txt",
    "content": "\n\n\n\nInternet Engineering Task Force                        J. Henderson, Ed.\nInternet-Draft                                             16 March 2024\nIntended status: Experimental\nExpires: 17 September 2024\n\n\n                     Unicode Separated Values (USV)\n                   draft-unicode-separated-values-01\n\nAbstract\n\n   Unicode Separated Values (USV) is a data format that uses Unicode\n   characters to mark parts.  USV builds on ASCII separated values\n   (ASV), and provides pragmatic ways to edit data in text editors by\n   using visual symbols and layouts.\n\nStatus of This Memo\n\n   This Internet-Draft is submitted in full conformance with the\n   provisions of BCP 78 and BCP 79.\n\n   Internet-Drafts are working documents of the Internet Engineering\n   Task Force (IETF).  Note that other groups may also distribute\n   working documents as Internet-Drafts.  The list of current Internet-\n   Drafts is at https://datatracker.ietf.org/drafts/current/.\n\n   Internet-Drafts are draft documents valid for a maximum of six months\n   and may be updated, replaced, or obsoleted by other documents at any\n   time.  It is inappropriate to use Internet-Drafts as reference\n   material or to cite them other than as \"work in progress.\"\n\n   This Internet-Draft will expire on 17 September 2024.\n\nCopyright Notice\n\n   Copyright (c) 2024 IETF Trust and the persons identified as the\n   document authors.  All rights reserved.\n\n   This document is subject to BCP 78 and the IETF Trust's Legal\n   Provisions Relating to IETF Documents (https://trustee.ietf.org/\n   license-info) in effect on the date of publication of this document.\n   Please review these documents carefully, as they describe your rights\n   and restrictions with respect to this document.  Code Components\n   extracted from this document must include Revised BSD License text as\n   described in Section 4.e of the Trust Legal Provisions and are\n   provided without warranty as described in the Revised BSD License.\n\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 1]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\nTable of Contents\n\n   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3\n     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3\n     1.2.  Media Type Language . . . . . . . . . . . . . . . . . . .   3\n     1.3.  ABNF Language . . . . . . . . . . . . . . . . . . . . . .   3\n   2.  USV characters  . . . . . . . . . . . . . . . . . . . . . . .   3\n   3.  Definition of the USV Format  . . . . . . . . . . . . . . . .   4\n     3.1.  Data  . . . . . . . . . . . . . . . . . . . . . . . . . .   4\n     3.2.  Unit  . . . . . . . . . . . . . . . . . . . . . . . . . .   4\n     3.3.  Record  . . . . . . . . . . . . . . . . . . . . . . . . .   4\n     3.4.  Group . . . . . . . . . . . . . . . . . . . . . . . . . .   4\n     3.5.  File  . . . . . . . . . . . . . . . . . . . . . . . . . .   4\n     3.6.  Header  . . . . . . . . . . . . . . . . . . . . . . . . .   5\n     3.7.  Escape (ESC)  . . . . . . . . . . . . . . . . . . . . . .   5\n     3.8.  End of Transmission (EOT) . . . . . . . . . . . . . . . .   5\n   4.  ABNF grammar  . . . . . . . . . . . . . . . . . . . . . . . .   6\n     4.1.  Semantics . . . . . . . . . . . . . . . . . . . . . . . .   6\n     4.2.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . .   6\n     4.3.  Runs  . . . . . . . . . . . . . . . . . . . . . . . . . .   6\n     4.4.  Character classes . . . . . . . . . . . . . . . . . . . .   6\n     4.5.  Unicode symbols . . . . . . . . . . . . . . . . . . . . .   6\n   5.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .   7\n     5.1.  Hello World . . . . . . . . . . . . . . . . . . . . . . .   7\n     5.2.  Hello World Goodnight Moon  . . . . . . . . . . . . . . .   7\n     5.3.  Units, Records, Groups, Files . . . . . . . . . . . . . .   8\n     5.4.  Articles  . . . . . . . . . . . . . . . . . . . . . . . .   9\n   6.  Source Code Examples  . . . . . . . . . . . . . . . . . . . .  10\n   7.  MIME media type registration for text/usv . . . . . . . . . .  11\n     7.1.  Optional parameters: charset, header  . . . . . . . . . .  11\n     7.2.  Encoding considerations . . . . . . . . . . . . . . . . .  11\n     7.3.  Security considerations . . . . . . . . . . . . . . . . .  12\n     7.4.  Interoperability considerations . . . . . . . . . . . . .  12\n     7.5.  Published specification . . . . . . . . . . . . . . . . .  12\n     7.6.  Applications that use this media type . . . . . . . . . .  12\n     7.7.  Additional information  . . . . . . . . . . . . . . . . .  12\n   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12\n   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  13\n   10. Converters  . . . . . . . . . . . . . . . . . . . . . . . . .  13\n   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  13\n     11.1.  Normative References . . . . . . . . . . . . . . . . . .  13\n     11.2.  Informative References . . . . . . . . . . . . . . . . .  14\n   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  15\n   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .  15\n   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  15\n\n\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 2]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n1.  Introduction\n\n   Unicode Separated Values (USV) is a data format useful for exchanging\n   and converting data between various spreadsheet programs, databases,\n   and streaming data services.  This RFC explains USV.\n\n   Additionally, we propose a new media type \"text/usv\", to be\n   registered with IANA.\n\n   We provide information references for a USV git repository\n   [usv-git-repository], a programming implementation as a USV Rust\n   crate [usv-rust-crate], and converter tools.\n\n1.1.  Requirements Language\n\n   The key words \"MUST\", \"MUST NOT\", \"REQUIRED\", \"SHALL\", \"SHALL NOT\",\n   \"SHOULD\", \"SHOULD NOT\", \"RECOMMENDED\", \"NOT RECOMMENDED\", \"MAY\", and\n   \"OPTIONAL\" in this document are to be interpreted as described in BCP\n   14 [RFC2119] [RFC8174] when, and only when, they appear in all\n   capitals, as shown here.\n\n1.2.  Media Type Language\n\n   The media type normative references are RFC 6838 [RFC6838], RFC 2046\n   [RFC2046], and RFC 4289 [RFC4289].\n\n1.3.  ABNF Language\n\n   The ABNF normative reference is RFC 5234 [RFC5234].\n\n2.  USV characters\n\n   Separators:\n\n   *  File Separator (FS) is U+001C or U+241C\n\n   *  Group Separator (GS) is U+001D or U+241D\n\n   *  Record Separator (RS) is U+001E or U+241E\n\n   *  Unit Separator (US) is U+001F or U+241F\n\n   Modifiers:\n\n   *  Escape (ESC) is U+001B or U+241B\n\n   *  End of Transmission (EOT) is U+0004 or U+2404\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 3]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   Spacers:\n\n   *  Carriage Return (CR) is U+000D\n\n   *  Line Feed (LF) is U+000A\n\n3.  Definition of the USV Format\n\n3.1.  Data\n\n   Data comprises units, records, groups, and files.\n\n3.2.  Unit\n\n   A unit comprises content characters.  It runs until a Unit Separator\n   (US):\n\n   Example unit and unit separator:\n\n   <CODE BEGINS> file \"unit-and-unit-separator.usv\"\n   aaa␟\n   <CODE ENDS>\n\n3.3.  Record\n\n   A record comprises units.  It runs until a Record Separator (RS):\n\n   Example record and record separator:\n\n   <CODE BEGINS> file \"record-and-record-separator.usv\"\n   aaa␟bbb␟␞\n   <CODE ENDS>\n\n3.4.  Group\n\n   A group comprises records.  It runs until a Group Separator (GS):\n\n   Example group and group separator:\n\n   <CODE BEGINS> file \"group-and-group-separator.usv\"\n   aaa␟bbb␟␞ccc␟ddd␟␞␝\n   <CODE ENDS>\n\n3.5.  File\n\n   A file comprises groups.  It runs until a file separator.\n\n   Example file and file separator:\n\n\n\nHenderson               Expires 17 September 2024               [Page 4]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   <CODE BEGINS> file \"file-and-file-separator.usv\"\n   aaa␟bbb␟␞ccc␟ddd␟␞␝eee␟fff␟␞ggg␟hhh␟␞␝␜\n   <CODE ENDS>\n\n3.6.  Header\n\n   There may be an optional header appearing as the first item and with\n   the same format as normal items.  This header will contain names\n   corresponding to the fields in the data, and should contain the same\n   number of fields as the rest of data.  The presence or absence of the\n   header line should be indicated via the optional \"header\" parameter\n   of this media type.\n\n   For example:\n\n   <CODE BEGINS> file \"header.usv\"\n   name␟name␟␞aaa␟bbb␟␞\n   <CODE ENDS>\n\n3.7.  Escape (ESC)\n\n   Escape (ESC) makes the next character content.\n\n   Example: USV with a unit that contains an Escape + End of\n   Transmission; because of the Escape, the End of Transmission is\n   treated as content:\n\n   <CODE BEGINS> file \"header.usv\"\n     a␛␄b␟\n   <CODE ENDS>\n\n3.8.  End of Transmission (EOT)\n\n   End of Transmission (EOT) tells any reader that it can stop reading.\n   This is can be useful for streaming data, such as to end a\n   connection.  This can also be useful for providing data files that\n   contain USV data, then EOT, then addition non-USV information such as\n   comments, images, attachments, etc.\n\n   *  EOT tells the data reader that it can stop.\n\n   *  EOT has no effect on the output content.\n\n   Example of a unit then an End of Transmission:\n\n   <CODE BEGINS> file \"header.usv\"\n   abc␞␄ignorable\n   <CODE ENDS>\n\n\n\nHenderson               Expires 17 September 2024               [Page 5]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n4.  ABNF grammar\n\n4.1.  Semantics\n\n   usv = *files\n\n   file = *groups\n\n   group = *records\n\n   record = *units\n\n   unit = *content-characters\n\n4.2.  Syntax\n\n   usv = ( header-and-body / body ) '*' ; anything after the body is\n   chaff\n\n   header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-\n   run\n\n   body = *unit-run / *record-run / *group-run / *file-run\n\n4.3.  Runs\n\n   file-run = *( *spacer-character file *spacer-character FS )\n\n   group-run = *( *spacer-character group *spacer-character GS )\n\n   record-run = *( *spacer-character record *spacer-character RS )\n\n   unit-run = *( *spacer-character unit *spacer-character US )\n\n4.4.  Character classes\n\n   content-character = typical-character / ESC '*'\n\n   typical-character = '*' - special-character\n\n   special-character = US / RS / GS / FS / ESC / EOT\n\n   spacer-character = CR / LF\n\n4.5.  Unicode symbols\n\n   FS = U+001C File Separator / U+241C Symbol for File Separator\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 6]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   GS = U+001D Group Separator / U+241D Symbol for Group Separator\n\n   RS = U+001E Record Separator / U+241E Symbol for Record Separator\n\n   US = U+001F Unit Separator / U+241F Symbol for Unit Separator\n\n   ESC = U+001B Escape / U+241B Symbol for Escape\n\n   EOT = U+0004 End of Transmission / U+2404 Symbol for End of\n   Transmission\n\n   CR = U+000D Carriage Return\n\n   LF = U+000A Line Feed\n\n5.  Examples\n\n5.1.  Hello World\n\n   This kind of data ...\n\n   <CODE BEGINS> file \"hello-world.txt\"\n   hello, world\n   <CODE ENDS>\n\n   ... is represented in USV as two units:\n\n   <CODE BEGINS> file \"hello-world.usv\"\n   hello␟world␟\n   <CODE ENDS>\n\n   If you prefer to see one unit per line, then you can add carriage\n   returns and/or newlines:\n\n   <CODE BEGINS> file \"hello-world-with-lines.usv\"\n   hello␟\n   world␟\n   <CODE ENDS>\n\n5.2.  Hello World Goodnight Moon\n\n   This kind of data ...\n\n   <CODE BEGINS> file \"hello-world-goodnight-moon.txt\"\n   [ hello, world ], [ goodnight, moon ]\n   <CODE ENDS>\n\n   ... is represented in USV as two records, each with two units:\n\n\n\nHenderson               Expires 17 September 2024               [Page 7]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   <CODE BEGINS> file \"hello-world-goodnight-moon.usv\"\n   hello␟world␟␞goodnight␟moon␟␞\n   <CODE ENDS>\n\n   If you prefer to see one record per line, then you can add carriage\n   returns and/or newlines:\n\n   <CODE BEGINS> file \"hello-world-goodnight-moon-with-lines.usv\"\n   hello␟world␟␞\n   goodnight␟moon␟␞\n   <CODE ENDS>\n\n5.3.  Units, Records, Groups, Files\n\n   USV with 2 units by 2 records by 2 groups by 2 files:\n\n   <CODE BEGINS> file \"units-records-groups-files.usv\"\n   a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜\n   <CODE ENDS>\n\n   If you prefer to see one record per line, then you can add carriage\n   returns and/or newlines:\n\n   <CODE BEGINS> file \"units-records-groups-files-with-lines.usv\"\n   a␟b␟␞\n   c␟d␟␞\n   ␝\n   e␟f␟␞\n   g␟h␟␞\n   ␝\n   ␜\n   i␟j␟␞\n   k␟l␟␞\n   ␝\n   m␟n␟␞\n   o␟p␟␞\n   ␝\n   ␜\n   <CODE ENDS>\n\n   If you prefer to see one unit per line, then you can add carriage\n   returns and/or newlines:\n\n\n\n\n\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 8]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   <CODE BEGINS> file \"units-records-groups-files-with-lines.usv\"\n   a␟\n   b␟\n   ␞\n   c␟\n   d␟\n   ␞\n   ␝\n   e␟\n   f␟\n   ␞\n   g␟\n   h␟\n   ␞\n   ␝\n   ␜\n   i␟\n   j␟\n   ␞\n   k␟\n   l␟\n   ␞\n   ␝\n   m␟\n   n␟\n   ␞\n   o␟\n   p␟\n   ␞\n   ␝\n   ␜\n   <CODE ENDS>\n\n5.4.  Articles\n\n   USV can format paragraphs, such as in this example data stream of\n   articles; note the units contain leading spacers and trailing spacers.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHenderson               Expires 17 September 2024               [Page 9]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   <CODE BEGINS> file \"articles.usv\"\n   Title One\n   ␟\n   Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\n   tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim\n   veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.\n   ␟␞\n   Title Two\n   ␟\n   Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore\n   eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,\n   sunt in culpa qui officia deserunt mollit anim id est laborum.\n   ␟␞\n   Title Three\n   ␟\n   Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\n   doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\n   veritatis et quasi architecto beatae vitae dicta sunt explicabo.\n   ␟␞\n   <CODE ENDS>\n\n6.  Source Code Examples\n\n   These source code examples demonstrate the Rust programming language\n   and the USV Rust crate.\n\n   Units:\n\n   <CODE BEGINS> file \"usv-rust-crate-units.rs\"\n   use usv::*;\n   let str = \"a␟b␟\";\n   let units: Units = str.units().collect();\n   <CODE ENDS>\n\n   Records:\n\n   <CODE BEGINS> file \"usv-rust-crate-records.rs\"\n   use usv::*;\n   let str = \"a␟b␟␞c␟d␟␞\";\n   let records: Records = str.records().collect();\n   <CODE ENDS>\n\n   Groups:\n\n\n\n\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 10]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   <CODE BEGINS> file \"usv-rust-crate-groups.rs\"\n   use usv::*;\n   let str = \"a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝\";\n   let groups: Groups = str.groups().collect();\n   <CODE ENDS>\n\n   Files:\n\n   <CODE BEGINS> file \"usv-rust-crate-groups.rs\"\n   use usv::*;\n   let str = \"a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜\";\n   let files: Files = str.files().collect();\n   <CODE ENDS>\n\n7.  MIME media type registration for text/usv\n\n   This section provides the MIME media type registration application\n   information.\n\n   To: ietf-types@iana.org\n\n   Subject: Registration of MIME media type text/usv\n\n   MIME media type name: text\n\n   MIME subtype name: usv\n\n   Required parameters: none\n\n7.1.  Optional parameters: charset, header\n\n   Common usage of USV is UTF-8, but other character sets defined by\n   IANA for the \"text\" tree may be used in conjunction with the\n   \"charset\" parameter.\n\n   The \"header\" parameter indicates the presence or absence of the\n   header line.  Valid values are \"present\" or \"absent\".  Implementors\n   choosing not to use this parameter must make their own decisions as\n   to whether the header line is present or absent.\n\n7.2.  Encoding considerations\n\n   This media type uses LF to denote line breaks.  However, implementors\n   should be aware that some implementations may not conform i.e. may\n   incorrectly use other values.\n\n\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 11]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n7.3.  Security considerations\n\n   USV files contain passive text data that should not pose any risks.\n   However, it is possible in theory that malicious binary data may be\n   included in order to exploit potential buffer overruns in the program\n   processing USV data.  Additionally, private data may be shared via\n   this format (which of course applies to any text data).\n\n7.4.  Interoperability considerations\n\n   Implementors should \"be conservative in what you do, be liberal in\n   what you accept from others\" (RFC 793 [8]) when processing USV data.\n\n   Implementations deciding not to use the optional \"header\" parameter\n   must make their own decision as to whether the header is absent or\n   present.\n\n7.5.  Published specification\n\n   https://github.com/sixarm/usv\n\n7.6.  Applications that use this media type\n\n   Spreadsheet programs, such as with import/export.  Database programs,\n   such as with loading/saving text.  Data conversion utilities.\n\n7.7.  Additional information\n\n   Magic number(s): none\n\n   File extension(s): usv\n\n   Apple macOS File Type Code(s): TEXT\n\n   Intended usage: COMMON\n\n   Author/Change controller: IESG\n\n   Contact: Joel Parker Henderson <joel@joelparkerhenderson.com>\n\n8.  IANA Considerations\n\n   We are requesting IANA to create a standard MIME media type \"text/\n   usv\".\n\n   We have filed an IANA request for this, with same contact\n   information.\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 12]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n9.  Security Considerations\n\n   This document should not affect the security of the Internet.\n\n10.  Converters\n\n   We implement converters to/from USV and various popular data formats,\n   including ASCII Separated Values (ASV), Comma Separated Values (CSV),\n   JavaScript Object Notation (JSON), Microsoft Excel XML (XLSX).\n\n   *  asv-to-usv[asv-to-usv-rust-crate], usv-to-\n      asv[usv-to-asv-rust-crate]\n\n   *  csv-to-usv[csv-to-usv-rust-crate], usv-to-\n      csv[usv-to-csv-rust-crate]\n\n   *  json-to-usv[json-to-usv-rust-crate], usv-to-\n      json[usv-to-json-rust-crate]\n\n   *  xlsx-to-usv[xlsx-to-usv-rust-crate], usv-to-\n      xlsx[usv-to-xlsx-rust-crate]\n\n   The converters are provided for informational purposes.  The\n   converters are not part of the specification.\n\n11.  References\n\n11.1.  Normative References\n\n   [RFC8174]  Leiba, B., \"Ambiguity of Uppercase vs Lowercase in RFC\n              2119 Key Words\", BCP 14, RFC 8174, DOI 10.17487/RFC8174,\n              May 2017, <https://www.rfc-editor.org/info/rfc8174>.\n\n   [RFC5234]  Crocker, D., Ed. and P. Overell, \"Augmented BNF for Syntax\n              Specifications: ABNF\", STD 68, RFC 5234,\n              DOI 10.17487/RFC5234, January 2008,\n              <https://www.rfc-editor.org/info/rfc5234>.\n\n   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, \"Media Type\n              Specifications and Registration Procedures\", BCP 13,\n              RFC 6838, DOI 10.17487/RFC6838, January 2013,\n              <https://www.rfc-editor.org/info/rfc6838>.\n\n   [RFC2046]  Freed, N. and N. Borenstein, \"Multipurpose Internet Mail\n              Extensions (MIME) Part Two: Media Types\", RFC 2046,\n              DOI 10.17487/RFC2046, November 1996,\n              <https://www.rfc-editor.org/info/rfc2046>.\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 13]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   [RFC4289]  Freed, N. and J. Klensin, \"Multipurpose Internet Mail\n              Extensions (MIME) Part Four: Registration Procedures\",\n              BCP 13, RFC 4289, DOI 10.17487/RFC4289, December 2005,\n              <https://www.rfc-editor.org/info/rfc4289>.\n\n11.2.  Informative References\n\n   [usv-git-repository]\n              Henderson, J., \"USV git repository at\n              https://github.com/sixarm/usv\", 2022.\n\n   [usv-rust-crate]\n              Henderson, J., \"USV rust crate at\n              https://crates.io/crates/usv\", 2024.\n\n   [asv-to-usv-rust-crate]\n              Henderson, J., \"ASV to USV rust crate at\n              https://crates.io/crates/asv-to-usv\", 2024.\n\n   [usv-to-asv-rust-crate]\n              Henderson, J., \"USV to ASV rust crate at\n              https://crates.io/crates/usv-to-asv\", 2024.\n\n   [csv-to-usv-rust-crate]\n              Henderson, J., \"CSV to USV rust crate at\n              https://crates.io/crates/csv-to-usv\", 2024.\n\n   [usv-to-csv-rust-crate]\n              Henderson, J., \"USV to CSV rust crate at\n              https://crates.io/crates/usv-to-csv\", 2024.\n\n   [json-to-usv-rust-crate]\n              Henderson, J., \"JSON to USV rust crate at\n              https://crates.io/crates/json-to-usv\", 2024.\n\n   [usv-to-json-rust-crate]\n              Henderson, J., \"USV to JSON rust crate at\n              https://crates.io/crates/usv-to-json\", 2024.\n\n   [xlsx-to-usv-rust-crate]\n              Henderson, J., \"XLSX to USV rust crate at\n              https://crates.io/crates/xlsx-to-usv\", 2024.\n\n   [usv-to-xlsx-rust-crate]\n              Henderson, J., \"USV to XLSX rust crate at\n              https://crates.io/crates/usv-to-xlsx\", 2024.\n\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 14]\n\f\nInternet-Draft       Unicode Separated Values (USV)           March 2024\n\n\n   [RFC2119]  Bradner, S., \"Key words for use in RFCs to Indicate\n              Requirement Levels\", BCP 14, RFC 2119,\n              DOI 10.17487/RFC2119, March 1997,\n              <https://www.rfc-editor.org/info/rfc2119>.\n\nAcknowledgements\n\n   The author would like to thank Y.  Shafranovich, author of the CSV\n   RFC, which provided guidance for this USV RFC.\n\n   A special thank you goes to P.X.V.\n\nContributors\n\n   Thanks to all of the contributors.\n\n   Joel Parker Henderson\n   Email: joel@joelparkerhenderson.com\n\n\nAuthor's Address\n\n   Joel Parker Henderson (editor)\n   601 Van Ness Ave #E3-359\n   San Francisco, CA 94102\n   United States of America\n   Phone: 1-415-317-2700\n   Email: joel@joelparkerhenderson.com\n   URI:   https://linkedin.com/in/joelparkerhenderson\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHenderson               Expires 17 September 2024              [Page 15]\n"
  },
  {
    "path": "doc/rfc/draft-unicode-separated-values-01.xml",
    "content": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<!--\n\n  draft-unicode-separated-values-01\n\n  Based on draft-rfcxml-general-template-annotated-00\n\n  This template includes examples of most of the features of RFCXML with comments explaining\n  how to customise them, and examples of how to achieve specific formatting.\n\n  Documentation:\n  https://authors.ietf.org/en/templates-and-schemas\n\n  To parse this XML, such as to create a PDF:\n  https://author-tools.ietf.org/\n\n  RFCXML vocabulary:\n  https://authors.ietf.org/rfcxml-vocabulary\n\n  Output:\n\n  * URL: https://www.ietf.org/archive/id/draft-unicode-separated-values-01.txt\n  * Status: https://datatracker.ietf.org/doc/draft-unicode-separated-values/\n  * HTML: https://www.ietf.org/archive/id/draft-unicode-separated-values-01.html\n  * HTMLized: https://datatracker.ietf.org/doc/html/draft-unicode-separated-values\n\n-->\n<?xml-model href=\"rfc7991bis.rnc\"?>  <!-- Required for schema validation and schema-aware editing -->\n<!-- <?xml-stylesheet type=\"text/xsl\" href=\"rfc2629.xslt\" ?> -->\n<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->\n\n<!DOCTYPE rfc [\n  <!ENTITY nbsp    \"&#160;\">\n  <!ENTITY zwsp   \"&#8203;\">\n  <!ENTITY nbhy   \"&#8209;\">\n  <!ENTITY wj     \"&#8288;\">\n]>\n<!-- If further character entities are required then they should be added to the DOCTYPE above.\n     Use of an external entity file is not recommended. -->\n\n<rfc\n  xmlns:xi=\"http://www.w3.org/2001/XInclude\"\n  category=\"exp\"\n  docName=\"draft-unicode-separated-values-01\"\n  ipr=\"trust200902\"\n  obsoletes=\"\"\n  updates=\"\"\n  submissionType=\"IETF\"\n  xml:lang=\"en\"\n  version=\"3\">\n<!--\n    * docName should be the name of your draft\n    * category should be one of std, bcp, info, exp, historic\n    * ipr should be one of trust200902, noModificationTrust200902, noDerivativesTrust200902, pre5378Trust200902\n    * updates can be an RFC number as NNNN\n    * obsoletes can be an RFC number as NNNN\n-->\n\n  <front>\n    <title abbrev=\"Unicode Separated Values (USV)\">Unicode Separated Values (USV)</title> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#title-4 -->\n    <!--  The abbreviated title is required if the full title is longer than 39 characters -->\n\n    <seriesInfo name=\"Internet-Draft\" value=\"unicode-separated-values\"/> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#seriesinfo -->\n    <!-- Set value to the name of the draft  -->\n\n    <author fullname=\"Joel Parker Henderson\" initials=\"J\" role=\"editor\" surname=\"Henderson\"> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#author -->\n    <!-- initials should not include an initial for the surname -->\n    <!-- role=\"editor\" is optional -->\n    <!-- Can have more than one author -->\n\n    <!-- all of the following elements are optional -->\n      <address> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#address -->\n        <postal>\n          <!-- Reorder these if your country does things differently -->\n          <street>601 Van Ness Ave #E3-359</street>\n          <city>San Francisco</city>\n          <region>CA</region>\n          <code>94102</code>\n          <country>US</country>\n          <!-- Can use two letter country code -->\n        </postal>\n        <phone>1-415-317-2700</phone>\n        <email>joel@joelparkerhenderson.com</email>\n        <!-- Can have more than one <email> element -->\n        <uri>https://linkedin.com/in/joelparkerhenderson</uri>\n      </address>\n    </author>\n\n    <date year=\"2024\" month=\"3\" day=\"16\"/> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#date -->\n    <!-- On draft subbmission:\n         * If only the current year is specified, the current day and month will be used.\n         * If the month and year are both specified and are the current ones, the current day will\n           be used\n         * If the year is not the current one, it is necessary to specify at least a month and day=\"1\" will be used.\n    -->\n\n    <area>General</area>\n    <workgroup>Internet Engineering Task Force</workgroup>\n    <!-- \"Internet Engineering Task Force\" is fine for individual submissions.  If this element is\n          not present, the default is \"Network Working Group\", which is used by the RFC Editor as\n          a nod to the history of the RFC Series. -->\n\n    <keyword>usv</keyword>\n    <keyword>data</keyword>\n    <keyword>format</keyword>\n    <keyword>markup</keyword>\n    <!-- Multiple keywords are allowed.  Keywords are incorporated into HTML output files for\n         use by search engines. -->\n\n    <abstract>\n      <t>\n        Unicode Separated Values (USV) is a data format that uses Unicode\n        characters to mark parts. USV builds on ASCII separated values (ASV),\n        and provides pragmatic ways to edit data in text editors by using visual\n        symbols and layouts.\n      </t>\n    </abstract>\n\n  </front>\n\n  <middle>\n\n    <section>\n    <!-- The default attributes for <section> are numbered=\"true\" and toc=\"default\" -->\n      <name>Introduction</name>\n\n      <t>\n        Unicode Separated Values (USV) is a data format useful for exchanging\n        and converting data between various spreadsheet programs, databases,\n        and streaming data services. This RFC explains USV.\n      </t>\n\n      <t>\n        Additionally, we propose a new media type \"text/usv\", to be registered\n        with IANA.\n      </t>\n      <t>\n        We provide information references for a USV git repository <xref\n        target=\"usv-git-repository\"/>, a programming implementation as a USV\n        Rust crate <xref target=\"usv-rust-crate\"/>, and converter tools.\n      </t>\n      <section anchor=\"requirements-language\">\n        <name>Requirements Language</name>\n        <t>\n          The key words \"MUST\", \"MUST NOT\", \"REQUIRED\", \"SHALL\", \"SHALL NOT\",\n          \"SHOULD\", \"SHOULD NOT\", \"RECOMMENDED\", \"NOT RECOMMENDED\", \"MAY\", and\n          \"OPTIONAL\" in this document are to be interpreted as described in BCP\n          14 <xref target=\"RFC2119\"/>\n          <xref target=\"RFC8174\"/> when, and only when, they appear in all\n          capitals, as shown here.\n        </t>\n      </section>\n      <section anchor=\"media-type-language\">\n        <name>Media Type Language</name>\n        <t>\n          The media type normative references are RFC 6838 <xref\n          target=\"RFC6838\"/>, RFC 2046 <xref target=\"RFC2046\"/>, and RFC 4289\n          <xref target=\"RFC4289\"/>.\n        </t>\n      </section>\n      <section anchor=\"abnf-language\">\n        <name>ABNF Language</name>\n        <t>\n          The ABNF normative reference is RFC 5234 <xref target=\"RFC5234\"/>.\n        </t>\n      </section>\n    </section>\n\n    <section>\n      <name>USV characters</name>\n\n      <t>Separators:</t>\n\n      <ul>\n        <li>File Separator (FS) is U+001C or U+241C</li>\n        <li>Group Separator (GS) is U+001D or U+241D</li>\n        <li>Record Separator (RS) is U+001E or U+241E</li>\n        <li>Unit Separator (US) is U+001F or U+241F</li>\n      </ul>\n\n      <t>Modifiers:</t>\n\n      <ul>\n        <li>Escape (ESC) is U+001B or U+241B</li>\n        <li>End of Transmission (EOT) is U+0004 or U+2404</li>\n      </ul>\n\n    </section>\n\n    <section>\n      <name>Definition of the USV Format</name>\n\n      <section>\n        <name>Data</name>\n        <t>\n          Data comprises units, records, groups, and files.\n        </t>\n      </section>\n\n      <section>\n        <name>Unit</name>\n        <t>\n          A unit comprises content characters.\n          It runs until a Unit Separator (US):\n        </t>\n        <t>\n          Example unit and unit separator:\n        </t>\n        <sourcecode name=\"unit-and-unit-separator.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\naaa␟\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Record</name>\n        <t>\n          A record comprises units.\n          It runs until a Record Separator (RS):\n        </t>\n        <t>\n          Example record and record separator:\n        </t>\n        <sourcecode name=\"record-and-record-separator.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\naaa␟bbb␟␞\n]]>\n        </sourcecode>\n\n      </section>\n\n      <section>\n        <name>Group</name>\n        <t>\n          A group comprises records.\n          It runs until a Group Separator (GS):\n        </t>\n        <t>\n          Example group and group separator:\n        </t>\n        <sourcecode name=\"group-and-group-separator.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\naaa␟bbb␟␞ccc␟ddd␟␞␝\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>File</name>\n        <t>\n          A file comprises groups.\n          It runs until a file separator.\n        </t>\n        <t>\n          Example file and file separator:\n        </t>\n        <sourcecode name=\"file-and-file-separator.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\naaa␟bbb␟␞ccc␟ddd␟␞␝eee␟fff␟␞ggg␟hhh␟␞␝␜\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Header</name>\n        <t>\n          There may be an optional header appearing as the first item and with\n          the same format as normal items.  This header will contain names\n          corresponding to the fields in the data, and should contain the same\n          number of fields as the rest of data. The presence or absence of the\n          header line should be indicated via the optional \"header\" parameter\n          of this media type.\n        </t>\n        <t>\n          For example:\n        </t>\n        <sourcecode name=\"header.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nname␟name␟␞aaa␟bbb␟␞\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Escape (ESC)</name>\n        <t>\n          Escape (ESC) makes the next character content.\n        </t>\n\n        <t>\n          Example: USV with a unit that contains an Escape + End of\n          Transmission; because of the Escape, the End of Transmission is\n          treated as content:\n        </t>\n        <sourcecode name=\"header.usv\" type=\"usv\" markers=\"true\">\n  <![CDATA[\n  a␛␄b␟\n  ]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>End of Transmission (EOT)</name>\n        <t>\n          End of Transmission (EOT) tells any reader that it can stop reading.\n          This is can be useful for streaming data, such as to end a connection.\n          This can also be useful for providing data files that contain USV\n          data, then EOT, then addition non-USV information such as comments,\n          images, attachments, etc.\n        </t>\n        <ul>\n          <li>\n            EOT tells the data reader that it can stop.\n          </li>\n          <li>\n            EOT has no effect on the output content.\n          </li>\n        </ul>\n        <t>\n          Example of a unit then an End of Transmission:\n        </t>\n        <sourcecode name=\"header.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nabc␞␄ignorable\n]]>\n        </sourcecode>\n      </section>\n\n    </section>\n\n    <section>\n      <name>ABNF grammar</name>\n\n      <section>\n\n        <name>Semantics</name>\n\n        <t>usv = *files</t>\n\n        <t>file = *groups</t>\n\n        <t>group = *records</t>\n\n        <t>record = *units</t>\n\n        <t>unit = *content-characters</t>\n\n      </section>\n\n      <section>\n\n        <name>Syntax</name>\n\n        <t>usv = ( header-and-body / body ) '*' ; anything after the body is chaff</t>\n\n        <t>header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-run</t>\n\n        <t>body = *unit-run / *record-run / *group-run / *file-run</t>\n\n      </section>\n\n      <section>\n\n        <name>Runs</name>\n\n        <t>file-run = *( *spacer-character file *spacer-character FS )</t>\n\n        <t>group-run = *( *spacer-character group *spacer-character GS )</t>\n\n        <t>record-run = *( *spacer-character record *spacer-character RS )</t>\n\n        <t>unit-run = *( *spacer-character unit *spacer-character US )</t>\n\n      </section>\n\n      <section>\n\n        <name>Character classes</name>\n\n        <t>content-character = typical-character / ESC '*'</t>\n\n        <t>typical-character = '*' - special-character</t>\n\n        <t>special-character = US / RS / GS / FS / ESC / EOT</t>\n\n        <t>spacer-character = Defined by Unicode Derived Core Property White_Space</t>\n\n      </section>\n\n      <section>\n\n        <name>Unicode symbols</name>\n\n        <t>FS = U+001C File Separator / U+241C Symbol for File Separator</t>\n\n        <t>GS = U+001D Group Separator / U+241D Symbol for Group Separator</t>\n\n        <t>RS = U+001E Record Separator / U+241E Symbol for Record Separator</t>\n\n        <t>US = U+001F Unit Separator / U+241F Symbol for Unit Separator</t>\n\n        <t>ESC = U+001B Escape / U+241B Symbol for Escape</t>\n\n        <t>EOT = U+0004 End of Transmission / U+2404 Symbol for End of Transmission</t>\n\n      </section>\n\n    </section>\n\n    <section>\n      <name>Examples</name>\n\n      <section>\n        <name>Hello World</name>\n        <t>\n          This kind of data ...\n        </t>\n        <sourcecode name=\"hello-world.txt\" type=\"txt\" markers=\"true\">\n<![CDATA[\nhello, world\n]]>\n        </sourcecode>\n        <t>\n          ... is represented in USV as two units:\n        </t>\n        <sourcecode name=\"hello-world.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nhello␟world␟\n]]>\n        </sourcecode>\n        <t>\n          If you prefer to see one unit per line, then you can add whitespace,\n          such as newlines:\n        </t>\n        <sourcecode name=\"hello-world-with-lines.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nhello␟\nworld␟\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Hello World Goodnight Moon</name>\n        <t>\n          This kind of data ...\n        </t>\n        <sourcecode name=\"hello-world-goodnight-moon.txt\" type=\"txt\" markers=\"true\">\n<![CDATA[\n[ hello, world ], [ goodnight, moon ]\n]]>\n        </sourcecode>\n        <t>\n          ... is represented in USV as two records, each with two units:\n        </t>\n        <sourcecode name=\"hello-world-goodnight-moon.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nhello␟world␟␞goodnight␟moon␟␞\n]]>\n        </sourcecode>\n        <t>\n          If you prefer to see one record per line, then you can add whitespace,\n          such as newlines:\n        </t>\n        <sourcecode name=\"hello-world-goodnight-moon-with-lines.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nhello␟world␟␞\ngoodnight␟moon␟␞\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Units, Records, Groups, Files</name>\n        <t>\n          USV with 2 units by 2 records by 2 groups by 2 files:\n        </t>\n        <sourcecode name=\"units-records-groups-files.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\na␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜\n]]>\n        </sourcecode>\n        <t>\n          If you prefer to see one record per line, then you can add whitespace,\n          such as newlines:\n        </t>\n        <sourcecode name=\"units-records-groups-files-with-lines.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\na␟b␟␞\nc␟d␟␞\n␝\ne␟f␟␞\ng␟h␟␞\n␝\n␜\ni␟j␟␞\nk␟l␟␞\n␝\nm␟n␟␞\no␟p␟␞\n␝\n␜\n]]>\n        </sourcecode>\n        <t>\n          If you prefer to see one unit per line, then you can add whitespace,\n          such as newlines:\n        </t>\n        <sourcecode name=\"units-records-groups-files-with-lines.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\na␟\nb␟\n␞\nc␟\nd␟\n␞\n␝\ne␟\nf␟\n␞\ng␟\nh␟\n␞\n␝\n␜\ni␟\nj␟\n␞\nk␟\nl␟\n␞\n␝\nm␟\nn␟\n␞\no␟\np␟\n␞\n␝\n␜\n]]>\n        </sourcecode>\n      </section>\n\n      <section>\n        <name>Articles</name>\n        <t>\n          USV can format paragraphs, such as in this example data stream of\n          articles; note the units contain leading spacers and trailing spacers.\n        </t>\n        <sourcecode name=\"articles.usv\" type=\"usv\" markers=\"true\">\n<![CDATA[\nTitle One\n␟\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim\nveniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.\n␟␞\nTitle Two\n␟\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore\neu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,\nsunt in culpa qui officia deserunt mollit anim id est laborum.\n␟␞\nTitle Three\n␟\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\ndoloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\nveritatis et quasi architecto beatae vitae dicta sunt explicabo.\n␟␞\n]]>\n        </sourcecode>\n      </section>\n    </section>\n\n    <section>\n      <name>Source Code Examples</name>\n      <t>These source code examples demonstrate the Rust programming language and the USV Rust crate.</t>\n\n      <t>Units:</t>\n      <sourcecode name=\"usv-rust-crate-units.rs\" type=\"rust\" markers=\"true\">\n<![CDATA[\nuse usv::*;\nlet str = \"a␟b␟\";\nlet units: Units = str.units().collect();\n]]>\n      </sourcecode>\n\n      <t>Records:</t>\n      <sourcecode name=\"usv-rust-crate-records.rs\" type=\"rust\" markers=\"true\">\n<![CDATA[\nuse usv::*;\nlet str = \"a␟b␟␞c␟d␟␞\";\nlet records: Records = str.records().collect();\n]]>\n      </sourcecode>\n\n      <t>Groups:</t>\n      <sourcecode name=\"usv-rust-crate-groups.rs\" type=\"rust\" markers=\"true\">\n<![CDATA[\nuse usv::*;\nlet str = \"a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝\";\nlet groups: Groups = str.groups().collect();\n]]>\n      </sourcecode>\n\n      <t>Files:</t>\n      <sourcecode name=\"usv-rust-crate-groups.rs\" type=\"rust\" markers=\"true\">\n<![CDATA[\nuse usv::*;\nlet str = \"a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜\";\nlet files: Files = str.files().collect();\n]]>\n      </sourcecode>\n    </section>\n\n    <section>\n    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->\n      <name>MIME media type registration for text/usv</name>\n      <t>\n        This section provides the MIME media type registration application information.\n      </t>\n      <t>\n        To: ietf-types@iana.org\n      </t>\n      <t>\n        Subject: Registration of MIME media type text/usv\n      </t>\n      <t>\n        MIME media type name: text\n      </t>\n      <t>\n        MIME subtype name: usv\n      </t>\n      <t>\n        Required parameters: none\n      </t>\n\n      <section>\n        <name>Optional parameters: charset, header</name>\n        <t>\n          Common usage of USV is UTF-8, but other character sets defined by IANA\n          for the \"text\" tree may be used in conjunction with the \"charset\"\n          parameter.\n        </t>\n        <t>\n          The \"header\" parameter indicates the presence or absence of the\n          header line.  Valid values are \"present\" or \"absent\".\n          Implementors choosing not to use this parameter must make their\n          own decisions as to whether the header line is present or absent.\n        </t>\n      </section>\n      <section>\n        <name>Encoding considerations</name>\n        <t>\n          This media type uses LF to denote line breaks.  However, implementors\n          should be aware that some implementations may not conform i.e. may\n          incorrectly use other values.\n        </t>\n      </section>\n      <section>\n        <name>Security considerations</name>\n        <t>\n          USV files contain passive text data that should not pose any\n          risks.  However, it is possible in theory that malicious binary\n          data may be included in order to exploit potential buffer overruns\n          in the program processing USV data.  Additionally, private data\n          may be shared via this format (which of course applies to any text\n          data).\n        </t>\n      </section>\n      <section>\n        <name>Interoperability considerations</name>\n        <t>\n          Implementors should \"be conservative in what you do, be liberal in\n          what you accept from others\" (RFC 793 [8]) when processing USV data.\n        </t>\n        <t>\n          Implementations deciding not to use the optional \"header\"\n          parameter must make their own decision as to whether the header is\n          absent or present.\n        </t>\n      </section>\n      <section>\n        <name>Published specification</name>\n        <t>\n          https://github.com/sixarm/usv\n        </t>\n      </section>\n      <section>\n        <name>Applications that use this media type</name>\n        <t>\n          Spreadsheet programs, such as with import/export.\n          Database programs, such as with loading/saving text.\n          Data conversion utilities.\n        </t>\n      </section>\n      <section>\n        <name>Additional information</name>\n        <t>\n          Magic number(s): none\n        </t>\n        <t>\n          File extension(s): usv\n        </t>\n        <t>\n          Apple macOS File Type Code(s): TEXT\n        </t>\n        <t>\n          Intended usage: COMMON\n        </t>\n        <t>\n          Author/Change controller: IESG\n        </t>\n        <t>Contact: Joel Parker Henderson &lt;joel@joelparkerhenderson.com&gt;\n        </t>\n      </section>\n    </section>\n\n    <section anchor=\"IANA\">\n    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->\n      <name>IANA Considerations</name>\n      <t>We are requesting IANA to create a standard MIME media type \"text/usv\".</t>\n      <t>We have filed an IANA request for this, with same contact information.</t>\n\n    </section>\n\n    <section anchor=\"Security\">\n      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->\n      <name>Security Considerations</name>\n      <t>This document should not affect the security of the Internet.</t>\n    </section>\n\n    <!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->\n\n    <section anchor=\"Converters\">\n      <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->\n        <name>Converters</name>\n        <t>\n          We implement converters to/from USV and various popular data formats,\n          including ASCII Separated Values (ASV), Comma Separated Values (CSV),\n          JavaScript Object Notation (JSON), Microsoft Excel XML (XLSX).\n        </t>\n        <ul>\n          <li>asv-to-usv<xref target=\"asv-to-usv-rust-crate\"/>, usv-to-asv<xref target=\"usv-to-asv-rust-crate\"/></li>\n          <li>csv-to-usv<xref target=\"csv-to-usv-rust-crate\"/>, usv-to-csv<xref target=\"usv-to-csv-rust-crate\"/></li>\n          <li>json-to-usv<xref target=\"json-to-usv-rust-crate\"/>, usv-to-json<xref target=\"usv-to-json-rust-crate\"/></li>\n          <li>xlsx-to-usv<xref target=\"xlsx-to-usv-rust-crate\"/>, usv-to-xlsx<xref target=\"usv-to-xlsx-rust-crate\"/></li>\n        </ul>\n        <t>\n          The converters are provided for informational purposes. The converters\n          are not part of the specification.\n        </t>\n      </section>\n\n  </middle>\n\n  <back>\n    <references>\n      <name>References</name>\n      <references>\n        <name>Normative References</name>\n\n        <!-- \"Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words\" -->\n        <xi:include href=\"https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml\"/>\n\n        <!-- \"Augmented BNF for Syntax Specifications: ABNF\"-->\n        <xi:include href=\"https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml\"/>\n\n        <!-- \"Media Type Specifications and Registration Procedures\" -->\n        <xi:include href=\"https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6838.xml\"/>\n\n        <!-- \"\"Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types\" -->\n        <xi:include href=\"https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2046.xml\"/>\n\n        <!-- \"Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures\" -->\n        <xi:include href=\"https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4289.xml\"/>\n\n      </references>\n\n      <references>\n        <name>Informative References</name>\n\n        <reference anchor=\"usv-git-repository\">\n        <!-- Example minimum reference -->\n          <front>\n            <title>USV git repository at https://github.com/sixarm/usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2022\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"usv-rust-crate\">\n        <!-- Example minimum reference -->\n          <front>\n            <title>USV rust crate at https://crates.io/crates/usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"asv-to-usv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>ASV to USV rust crate at https://crates.io/crates/asv-to-usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"usv-to-asv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>USV to ASV rust crate at https://crates.io/crates/usv-to-asv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"csv-to-usv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>CSV to USV rust crate at https://crates.io/crates/csv-to-usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"usv-to-csv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>USV to CSV rust crate at https://crates.io/crates/usv-to-csv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"json-to-usv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>JSON to USV rust crate at https://crates.io/crates/json-to-usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"usv-to-json-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>USV to JSON rust crate at https://crates.io/crates/usv-to-json</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"xlsx-to-usv-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>XLSX to USV rust crate at https://crates.io/crates/xlsx-to-usv</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"usv-to-xlsx-rust-crate\">\n          <!-- Example minimum reference -->\n          <front>\n            <title>USV to XLSX rust crate at https://crates.io/crates/usv-to-xlsx</title>\n            <author initials=\"J\" surname=\"Henderson\">\n              <organization/>\n            </author>\n            <date year=\"2024\"/>\n          </front>\n        </reference>\n\n        <reference anchor=\"RFC2119\" target=\"https://www.rfc-editor.org/info/rfc2119\">\n          <!-- Manually added reference -->\n          <front>\n            <title>Key words for use in RFCs to Indicate Requirement Levels</title>\n            <author initials=\"S.\" surname=\"Bradner\" fullname=\"S. Bradner\">\n              <organization/>\n            </author>\n            <date year=\"1997\" month=\"March\"/>\n            <abstract>\n              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.\n              </t>\n            </abstract>\n          </front>\n          <seriesInfo name=\"BCP\" value=\"14\"/>\n          <seriesInfo name=\"RFC\" value=\"2119\"/>\n          <seriesInfo name=\"DOI\" value=\"10.17487/RFC2119\"/>\n        </reference>\n\n      </references>\n    </references>\n\n    <section anchor=\"Acknowledgements\" numbered=\"false\">\n      <!-- an Acknowledgements section is optional -->\n      <name>Acknowledgements</name>\n      <t>\n        The author would like to thank Y. Shafranovich, author of the CSV RFC,\n        which provided guidance for this USV RFC.\n      </t>\n      <t>\n        A special thank you goes to P.X.V.\n      </t>\n    </section>\n\n    <section anchor=\"Contributors\" numbered=\"false\">\n      <!-- a Contributors section is optional -->\n      <name>Contributors</name>\n      <t>Thanks to all of the contributors.</t>\n      <contact fullname=\"Joel Parker Henderson\" initials=\"J\" surname=\"Henderson\"><!-- https://authors.ietf.org/en/rfcxml-vocabulary#contact-->\n        <!-- including contact information for contributors is optional -->\n        <address>\n          <email>joel@joelparkerhenderson.com</email>\n        </address>\n      </contact>\n    </section>\n\n </back>\n</rfc>"
  },
  {
    "path": "doc/rfc/index.md",
    "content": "# Request For Comments (RFC)\n\nUSV is aiming to be an international standard with the IETF and IANA.\n\nWork in progress:\n\n* [https://datatracker.ietf.org/doc/draft-unicode-separated-values/01/](https://datatracker.ietf.org/doc/draft-unicode-separated-values/01/)\n\nFiles:\n\n* [draft-unicode-separated-values-01.xml](draft-unicode-separated-values-01.xml) - this is the official IETF RFCXML.\n\n* [draft-unicode-separated-values-01.pdf](draft-unicode-separated-values-01.pdf) - autogenerated from IETF RFCXML.\n\n* [draft-unicode-separated-values-01.txt](draft-unicode-separated-values-01.txt) - autogenerated from IETF RFCXML.\n"
  },
  {
    "path": "doc/spacers/index.md",
    "content": "# Spacers\n\nSpacers are characters that have the Unicode Derived Core Property White_Space.\n\nExamples:\n\n* U+0020 Space (SP)\n\n* U+0009 Tab (TAB) aka Horizontal Tab (HT)\n\n* U+000A Line Feed (LF) aka New Line (NL) aka End Of Line (EOL)\n\n* U+000D Carriage Return (CR)\n\nUSV supports spacers around content and markers, because this greatly helps typical display uses.\n\n\n## Line Feed character\n\nUSV with no spacers looks like this:\n\n```usv\na␟b␟␞c␟d␟␞\n```\n\nIf you want to see each record on its own line, then you can use newline characters:\n\n```usv\na␟b␟␞\nc␟d␟␞\n```\n\nIf you want to see each unit on its own line, then you can use newline characters:\n\n```usv\na␟\nb␟\n␞\nc␟\nd␟\n␞\n```\n\nIf you want to see each token on its own line, then you can use newline characters:\n\n```usv\na\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n```\n\n## Space character\n\nUSV with no spacers looks like this:\n\n```usv\na␟bbb␟ccccc␟\n```\n\nIf you want to see a column with left alignment, then you can use newline characters and space characters:\n\n```usv\na    ␟\nbbb  ␟\nccccc␟\n```\n\nIf you want to see a column with right alignment, then you can use newline characters and space characters:\n\n```usv\n    a␟\n  bbb␟\nccccc␟\n```\n\nIf you want to see a column with center alignment, then you can use newline characters and space characters:\n\n```usv\n  a  ␟\n bbb ␟\nccccc␟\n```\n"
  },
  {
    "path": "doc/styles/index.md",
    "content": "# Styles\n\nUSV styles can customize various kinds of output so it looks like you prefer.\n\n* Symbols: characters are visible symbols, such as \"␟\" for Unit Separator.\n\n* Controls: characters are invisible controls, such as \"\\u001F\" for Unit Separator.\n\n* Braces: instead of characters, use pretty-print braces, such as \"{US}\" for Unit Separator.\n"
  },
  {
    "path": "doc/todo/index.md",
    "content": "# TODO list\n\nWe welcome help with this todo list.\n\n\n## Add formats\n\nAdd USV formats to productivity applications:\n\n* [ ] LibreOffice Calc\n\n* [ ] Microsoft Excel\n\n* [ ] Google Sheets\n\n* Etc.\n\n\n## Create libraries\n\nCreate USV libraries for programming languages:\n\n* [x] Rust crate\n\n* [ ] Python pip package\n\n* [ ] Node npm package\n\n* [ ] Ruby gem\n\n* Etc.\n\n\n## Add handling\n\nAdd USV handling to statistics systems:\n\n* [ ] R\n\n* [ ] Julia\n\n* [ ] MatLab\n\n* [ ] Mathematica\n\n* [ ] Python fasspec\n\n* [ ] Python Pandas\n\n* [ ] Python Polars\n\n* [ ] Python Dask\n\n* Etc.\n\n\n## Extend CLI tools\n\nExtend USV capabilities for command line interface tools:\n\n* [ ] Miller <https://github.com/johnkerl/miller/issues/245>\n\n* [ ] TextQL <https://github.com/dinedal/textql/issues/115>\n\n* [ ] Q <https://github.com/harelba/q/issues/201>\n\n* [ ] jq\n\n* [ ] xsv by BurntSushi\n\n\n* Etc.\n\n\n## Add comparisons\n\nAdd comparisons to other data formats:\n\n* [ ] [Why isn’t there a decent file format for tabular data?](https://news.ycombinator.com/item?id=31220841)\n\n* [ ] [Whitespace Separated Values (WSV)](https://dev.stenway.com/WSV/)\n\n* [ ] [SimpleML](https://dev.stenway.com/SML/SimpleML.html)\n\n* [ ] [KYLI](https://shkspr.mobi/blog/2017/03/kyli-because-it-is-superior-to-json/)\n\n* [ ] [Rows of String Values (RSV)](https://github.com/Stenway/RSV-Specification)\n\n\n## Improve converters\n\nImprove converters: csv-to-usv and usv-to-csv\n\n* [ ] Add support for CSV delimiters, especially semi-colon instead of comma.\n\n* [ ] Add CLAP option for USV output with RS+ESC+LF.\n"
  },
  {
    "path": "examples/blog-posts.csv",
    "content": "\"Title One\",\"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\"\n\"Title Two\",\"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\"\n\"Title Three\",\"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.\"\n"
  },
  {
    "path": "examples/blog-posts.usv",
    "content": "Title One\n␟\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis\nnostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n␟␞\nTitle Two\n␟\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu\nfugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\nculpa qui officia deserunt mollit anim id est laborum.\n␟␞\nTitle Three\n␟\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\ndoloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\nveritatis et quasi architecto beatae vitae dicta sunt explicabo.\n␟␞\n"
  },
  {
    "path": "examples/end-of-transmission.usv",
    "content": "a␟b␟c␟␄\n\nThe End of Transmission (EOT) stops parsing.\nFor example, this text comes after the EOT character.\n"
  },
  {
    "path": "examples/hello-goodnight.csv",
    "content": "\"I say \"\"hello, world\"\"\"\n\"You say \"\"goodnight, moon\"\"\"\n"
  },
  {
    "path": "examples/hello-goodnight.usv",
    "content": "I say \"hello, world\"␟␞\nYou say \"goodnight, moon\"␟␞\n"
  },
  {
    "path": "examples/stream.usv",
    "content": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜"
  },
  {
    "path": "examples/zen-koans.csv",
    "content": "\"Truth Koan\",\"A monk asked, \"\"Without words or silence, will you tell me the truth?\"\"\"\n\"Lotus Koan\",\"A child asked, \"\"Before the lotus blossom emerges, what is it?\"\"\"\n\"World Koan\",\"A student asked, \"\"How does an enlightened one return to the world?\"\"\"\n"
  },
  {
    "path": "examples/zen-koans.usv",
    "content": "Truth Koan␟A monk asked, \"Without words or silence, will you tell me the truth?\"␟␞\nLotus Koan␟A child asked, \"Before the lotus blossom emerges, what is it?\"␟␞\nWorld Koan␟A student asked, \"How does an enlightened one return to the world?\"␟␞\n"
  },
  {
    "path": "tests/1-dimensional-as-line/expect.json",
    "content": "[\"a\",\"b\"]"
  },
  {
    "path": "tests/1-dimensional-as-line/input.usv",
    "content": "a␟b␟"
  },
  {
    "path": "tests/1-dimensional-as-lines/expect.json",
    "content": "[\"a\",\"b\"]"
  },
  {
    "path": "tests/1-dimensional-as-lines/input.usv",
    "content": "a\n␟\nb\n␟\n"
  },
  {
    "path": "tests/2-dimensional-as-line/expect.json",
    "content": "[[\"a\",\"b\"],[\"c\",\"d\"]]"
  },
  {
    "path": "tests/2-dimensional-as-line/input.usv",
    "content": "a␟b␟␞c␟d␟␞"
  },
  {
    "path": "tests/2-dimensional-as-lines/expect.json",
    "content": "[[\"a\",\"b\"],[\"c\",\"d\"]]"
  },
  {
    "path": "tests/2-dimensional-as-lines/input.usv",
    "content": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n"
  },
  {
    "path": "tests/3-dimensional-as-line/expect.json",
    "content": "[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]]"
  },
  {
    "path": "tests/3-dimensional-as-line/input.usv",
    "content": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝"
  },
  {
    "path": "tests/3-dimensional-as-lines/expect.json",
    "content": "[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]]"
  },
  {
    "path": "tests/3-dimensional-as-lines/input.usv",
    "content": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n␝\ne\n␟\nf\n␟\n␞\ng\n␟\nh\n␟\n␞\n␝\n"
  },
  {
    "path": "tests/4-dimensional-as-line/expect.json",
    "content": "[[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]],[[[\"i\",\"j\"],[\"k\",\"l\"]],[[\"m\",\"n\"],[\"o\",\"p\"]]]]"
  },
  {
    "path": "tests/4-dimensional-as-line/input.usv",
    "content": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜"
  },
  {
    "path": "tests/4-dimensional-as-lines/expect.json",
    "content": "[[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]],[[[\"i\",\"j\"],[\"k\",\"l\"]],[[\"m\",\"n\"],[\"o\",\"p\"]]]]"
  },
  {
    "path": "tests/4-dimensional-as-lines/input.usv",
    "content": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n␝\ne\n␟\nf\n␟\n␞\ng\n␟\nh\n␟\n␞\n␝\n␜\ni\n␟\nj\n␟\n␞\nk\n␟\nl\n␟\n␞\n␝\nm\n␟\nn\n␟\n␞\no\n␟\np\n␟\n␞\n␝\n␜\n"
  },
  {
    "path": "tests/blog-posts/output-actual.txt",
    "content": "Title One\n\nunit separator\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis\nnostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n\nrecord separator\n\nTitle Two\n\nunit separator\n\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu\nfugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\nculpa qui officia deserunt mollit anim id est laborum.\n\nrecord separator\n\nTitle Three\n\nunit separator\n\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\ndoloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\nveritatis et quasi architecto beatae vitae dicta sunt explicabo.\n\n"
  },
  {
    "path": "tests/blog-posts/output-expect.txt",
    "content": "Title One\n\nunit separator\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis\nnostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n\nrecord separator\n\nTitle Two\n\nunit separator\n\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu\nfugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\nculpa qui officia deserunt mollit anim id est laborum.\n\nrecord separator\n\nTitle Three\n\nunit separator\n\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\ndoloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\nveritatis et quasi architecto beatae vitae dicta sunt explicabo.\n\n"
  },
  {
    "path": "tests/blog-posts/test.sh",
    "content": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/blog-posts.usv\" | \"$top/bin/usv-to-debug.bash\" > output-actual.txt\ndiff output-actual.txt output-expect.txt\n"
  },
  {
    "path": "tests/end-of-transmission-block/output-actual.txt",
    "content": "a\nunit separator\nb\nunit separator\nc\nEnd of Transmission\n\n"
  },
  {
    "path": "tests/end-of-transmission-block/output-expect.txt",
    "content": "a\nunit separator\nb\nunit separator\nc\nEnd of Transmission\n\n"
  },
  {
    "path": "tests/end-of-transmission-block/test.sh",
    "content": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/end-of-transmission.usv\" | \"$top/bin/usv-to-debug.bash\" > output-actual.txt\ndiff output-actual.txt output-expect.txt\n"
  },
  {
    "path": "tests/microsoft-excel/example1.xls",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?mso-application progid=\"Excel.Sheet\"?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:c=\"urn:schemas-microsoft-com:office:component:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:x2=\"http://schemas.microsoft.com/office/excel/2003/xml\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><OfficeDocumentSettings xmlns=\"urn:schemas-microsoft-com:office:office\"><Colors><Color><Index>3</Index><RGB>#000000</RGB></Color><Color><Index>4</Index><RGB>#0000ee</RGB></Color><Color><Index>5</Index><RGB>#006600</RGB></Color><Color><Index>6</Index><RGB>#333333</RGB></Color><Color><Index>7</Index><RGB>#808080</RGB></Color><Color><Index>8</Index><RGB>#996600</RGB></Color><Color><Index>9</Index><RGB>#c0c0c0</RGB></Color><Color><Index>10</Index><RGB>#cc0000</RGB></Color><Color><Index>11</Index><RGB>#ccffcc</RGB></Color><Color><Index>12</Index><RGB>#dddddd</RGB></Color><Color><Index>13</Index><RGB>#ffcccc</RGB></Color><Color><Index>14</Index><RGB>#ffffcc</RGB></Color><Color><Index>15</Index><RGB>#ffffff</RGB></Color></Colors></OfficeDocumentSettings><ExcelWorkbook xmlns=\"urn:schemas-microsoft-com:office:excel\"><WindowHeight>9000</WindowHeight><WindowWidth>13860</WindowWidth><WindowTopX>240</WindowTopX><WindowTopY>75</WindowTopY><ProtectStructure>False</ProtectStructure><ProtectWindows>False</ProtectWindows></ExcelWorkbook><Styles><Style ss:ID=\"Default\" ss:Name=\"Default\"/><Style ss:ID=\"Note\" ss:Name=\"Note\"><Font ss:FontName=\"Liberation Sans\" ss:Size=\"10\"/></Style><Style ss:ID=\"Default\" ss:Name=\"Default\"/><Style ss:ID=\"Heading\" ss:Name=\"Heading\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"24\"/></Style><Style ss:ID=\"Heading_20_1\" ss:Name=\"Heading 1\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"18\"/></Style><Style ss:ID=\"Heading_20_2\" ss:Name=\"Heading 2\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"12\"/></Style><Style ss:ID=\"Text\" ss:Name=\"Text\"><Alignment/></Style><Style ss:ID=\"Note\" ss:Name=\"Note\"><Alignment/><Borders><Border ss:Position=\"Bottom\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Left\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Right\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Top\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/></Borders><Interior ss:Color=\"#ffffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Footnote\" ss:Name=\"Footnote\"><Alignment/></Style><Style ss:ID=\"Hyperlink\" ss:Name=\"Hyperlink\"><Alignment/></Style><Style ss:ID=\"Status\" ss:Name=\"Status\"><Alignment/></Style><Style ss:ID=\"Good\" ss:Name=\"Good\"><Alignment/><Interior ss:Color=\"#ccffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Neutral\" ss:Name=\"Neutral\"><Alignment/><Interior ss:Color=\"#ffffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Bad\" ss:Name=\"Bad\"><Alignment/><Interior ss:Color=\"#ffcccc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Warning\" ss:Name=\"Warning\"><Alignment/></Style><Style ss:ID=\"Error\" ss:Name=\"Error\"><Alignment/><Interior ss:Color=\"#cc0000\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent\" ss:Name=\"Accent\"><Alignment/></Style><Style ss:ID=\"Accent_20_1\" ss:Name=\"Accent 1\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#ffffff\"/><Interior ss:Color=\"#000000\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent_20_2\" ss:Name=\"Accent 2\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#ffffff\"/><Interior ss:Color=\"#808080\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent_20_3\" ss:Name=\"Accent 3\"><Alignment/><Interior ss:Color=\"#dddddd\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Result\" ss:Name=\"Result\"><Alignment/><Font ss:Bold=\"1\" ss:Italic=\"1\" ss:Underline=\"Single\"/></Style><Style ss:ID=\"co1\"/><Style ss:ID=\"ta1\"/></Styles><ss:Worksheet ss:Name=\"Sheet1\"><Table ss:StyleID=\"ta1\"><Column ss:Span=\"1\" ss:Width=\"64.008\"/><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">a</Data></Cell><Cell><Data ss:Type=\"String\">b</Data></Cell></Row><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">c</Data></Cell><Cell><Data ss:Type=\"String\">d</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet><ss:Worksheet ss:Name=\"Sheet2\"><Table ss:StyleID=\"ta1\"><Column ss:Span=\"1\" ss:Width=\"64.008\"/><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">e</Data></Cell><Cell><Data ss:Type=\"String\">f</Data></Cell></Row><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">g</Data></Cell><Cell><Data ss:Type=\"String\">h</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet></Workbook>"
  },
  {
    "path": "tests/microsoft-excel/example2.xls",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?mso-application progid=\"Excel.Sheet\"?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:c=\"urn:schemas-microsoft-com:office:component:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:x2=\"http://schemas.microsoft.com/office/excel/2003/xml\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><OfficeDocumentSettings xmlns=\"urn:schemas-microsoft-com:office:office\"><Colors><Color><Index>3</Index><RGB>#000000</RGB></Color><Color><Index>4</Index><RGB>#0000ee</RGB></Color><Color><Index>5</Index><RGB>#006600</RGB></Color><Color><Index>6</Index><RGB>#333333</RGB></Color><Color><Index>7</Index><RGB>#808080</RGB></Color><Color><Index>8</Index><RGB>#996600</RGB></Color><Color><Index>9</Index><RGB>#c0c0c0</RGB></Color><Color><Index>10</Index><RGB>#cc0000</RGB></Color><Color><Index>11</Index><RGB>#ccffcc</RGB></Color><Color><Index>12</Index><RGB>#dddddd</RGB></Color><Color><Index>13</Index><RGB>#ffcccc</RGB></Color><Color><Index>14</Index><RGB>#ffffcc</RGB></Color><Color><Index>15</Index><RGB>#ffffff</RGB></Color></Colors></OfficeDocumentSettings><ExcelWorkbook xmlns=\"urn:schemas-microsoft-com:office:excel\"><WindowHeight>9000</WindowHeight><WindowWidth>13860</WindowWidth><WindowTopX>240</WindowTopX><WindowTopY>75</WindowTopY><ProtectStructure>False</ProtectStructure><ProtectWindows>False</ProtectWindows></ExcelWorkbook><Styles><Style ss:ID=\"Default\" ss:Name=\"Default\"/><Style ss:ID=\"Note\" ss:Name=\"Note\"><Font ss:FontName=\"Liberation Sans\" ss:Size=\"10\"/></Style><Style ss:ID=\"Default\" ss:Name=\"Default\"/><Style ss:ID=\"Heading\" ss:Name=\"Heading\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"24\"/></Style><Style ss:ID=\"Heading_20_1\" ss:Name=\"Heading 1\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"18\"/></Style><Style ss:ID=\"Heading_20_2\" ss:Name=\"Heading 2\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#000000\" ss:Size=\"12\"/></Style><Style ss:ID=\"Text\" ss:Name=\"Text\"><Alignment/></Style><Style ss:ID=\"Note\" ss:Name=\"Note\"><Alignment/><Borders><Border ss:Position=\"Bottom\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Left\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Right\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/><Border ss:Position=\"Top\" ss:LineStyle=\"Continuous\" ss:Weight=\"1\" ss:Color=\"#808080\"/></Borders><Interior ss:Color=\"#ffffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Footnote\" ss:Name=\"Footnote\"><Alignment/></Style><Style ss:ID=\"Hyperlink\" ss:Name=\"Hyperlink\"><Alignment/></Style><Style ss:ID=\"Status\" ss:Name=\"Status\"><Alignment/></Style><Style ss:ID=\"Good\" ss:Name=\"Good\"><Alignment/><Interior ss:Color=\"#ccffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Neutral\" ss:Name=\"Neutral\"><Alignment/><Interior ss:Color=\"#ffffcc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Bad\" ss:Name=\"Bad\"><Alignment/><Interior ss:Color=\"#ffcccc\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Warning\" ss:Name=\"Warning\"><Alignment/></Style><Style ss:ID=\"Error\" ss:Name=\"Error\"><Alignment/><Interior ss:Color=\"#cc0000\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent\" ss:Name=\"Accent\"><Alignment/></Style><Style ss:ID=\"Accent_20_1\" ss:Name=\"Accent 1\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#ffffff\"/><Interior ss:Color=\"#000000\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent_20_2\" ss:Name=\"Accent 2\"><Alignment/><Font ss:Bold=\"1\" ss:Color=\"#ffffff\"/><Interior ss:Color=\"#808080\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Accent_20_3\" ss:Name=\"Accent 3\"><Alignment/><Interior ss:Color=\"#dddddd\" ss:Pattern=\"Solid\"/></Style><Style ss:ID=\"Result\" ss:Name=\"Result\"><Alignment/><Font ss:Bold=\"1\" ss:Italic=\"1\" ss:Underline=\"Single\"/></Style><Style ss:ID=\"co1\"/><Style ss:ID=\"ta1\"/></Styles><ss:Worksheet ss:Name=\"Sheet1\"><Table ss:StyleID=\"ta1\"><Column ss:Span=\"1\" ss:Width=\"64.008\"/><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">I</Data></Cell><Cell><Data ss:Type=\"String\">j</Data></Cell></Row><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">k</Data></Cell><Cell><Data ss:Type=\"String\">l</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet><ss:Worksheet ss:Name=\"Sheet2\"><Table ss:StyleID=\"ta1\"><Column ss:Span=\"1\" ss:Width=\"64.008\"/><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">m</Data></Cell><Cell><Data ss:Type=\"String\">n</Data></Cell></Row><Row ss:Height=\"12.816\"><Cell><Data ss:Type=\"String\">o</Data></Cell><Cell><Data ss:Type=\"String\">p</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet></Workbook>"
  },
  {
    "path": "tests/stream/output-actual.txt",
    "content": "a\nunit separator\nb\nrecord separator\nc\nunit separator\nd\ngroup separator\ne\nunit separator\nf\nrecord separator\ng\nunit separator\nh\nfile separator\ni\nunit separator\nj\nrecord separator\nk\nunit separator\nl\ngroup separator\nm\nunit separator\nn\nrecord separator\no\nunit separator\np\n"
  },
  {
    "path": "tests/stream/output-expect.txt",
    "content": "a\nunit separator\nb\nrecord separator\nc\nunit separator\nd\ngroup separator\ne\nunit separator\nf\nrecord separator\ng\nunit separator\nh\nfile separator\ni\nunit separator\nj\nrecord separator\nk\nunit separator\nl\ngroup separator\nm\nunit separator\nn\nrecord separator\no\nunit separator\np\n"
  },
  {
    "path": "tests/stream/test.sh",
    "content": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/stream.usv\" | \"$top/bin/usv-to-debug.bash\" > output-actual.txt\ndiff output-actual.txt output-expect.txt\n"
  },
  {
    "path": "todo.md",
    "content": "# TODO\n\n## Shift\n\nFor Hierarchy Levels:\n\n* ␏ U+240F Symbol for Shift In (SI).<br>\n  Use it to shift inward a level, for nesting, blocks, outlines, etc.\n\n* ␎ U+240E Symbol for Shift Out (SO).<br>\n  Use it to shift outward a level, for nesting, blocks, outlines, etc.\n\n\n## What is a hierarchy?\n\nSome data projects need more flexibility. For example, some data projects don't fit neatly into units, records, groups, files, because the data contains more kinds of clusters, or has nested clusters, etc.\n\nFor these needs, USV enables you to create your own hierarchy. If you know about data representations such as JSON, YAML, TOML, then you already understand how hierarchy works.\n\nExample JSON hierarchy:\n\n```\n{\n    \"colors\": [\n        \"red\",\n        \"green\",\n        \"blue\"\n    ]\n}\n```\n\nUSV uses two hierarchy characters:\n\n* \"shift-in\" goes inward a.k.a. begins a deeper hierarchy level.\n  \n* \"shift-out\" goes outward a.k.a. ends a deeper hierarchy level.\n\nUSV with a shift-in and a shift-out:\n\n```usv\ncolor␏red␎\n```\n\nPretty print renders shift-in as a left brace, and shift-out as brace, and with indentation:\n\n```txt\ncolor\n{ \n  red\n}\n```\n\nUSV with 2 shift ins and 2 shift outs:\n\n```usv\ncolors␏red␏scarlet␎green␏emerald␎blue␏cerulean␎␎\n```\n\nPretty print renders with even more indentation:\n\n```sh\ncolors\n{\n    red\n    {\n        scarlet\n    }\n    green\n    {\n        emerald\n    }\n    blue\n    {\n        cerulean\n    }\n}\n```\n\n\n#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that demonstrates the use of USV characters.\n# This script reads STDIN one character at a time, and prints text.\n\nescape=false\nindent=\"\"\n\nwhile IFS= read -n1 -r c; do\n    if [ \"$escape\" = true ]; then\n        printf %s \"$c\"\n        escape=false\n        continue\n    fi\n    case  \"$c\" in\n    \"␛\")\n        escape=true\n        ;;\n    \"␟\")\n        printf \",\"\n        ;;\n    \"␞\")\n        printf \"\\n%s\" \"$indent\"\n        ;;\n    \"␝\")\n        printf \"\\n%s-\\n%s\" \"$indent\" \"$indent\"\n        ;;\n    \"␜\")\n        printf \"\\n%s=\\n%s\" \"$indent\" \"$indent\"\n        ;;\n    \"␏\")\n        printf \"\\n%s{\" \"$indent\"\n        indent=\"$indent    \"\n        printf \"\\n%s\" \"$indent\"\n        ;;\n    \"␎\")\n        indent=${indent%????}\n        printf \"\\n%s}\\n%s\" \"$indent\" \"$indent\"\n        ;;\n    \"␗\")\n        break\n        ;;\n    *)\n        printf %s \"$c\"\n        ;;\n    esac\ndone\nprintf \"\\n\"\n"
  }
]