Repository: SixArm/usv
Branch: main
Commit: dd09cb6a8351
Files: 77
Total size: 140.5 KB
Directory structure:
gitextract_r3n42vl8/
├── CODE_OF_CONDUCT.md
├── README.md
├── bin/
│ ├── bash/
│ │ ├── usv-to-csv.bash
│ │ ├── usv-to-debug.bash
│ │ └── usv-to-display.bash
│ └── python/
│ ├── usv-to-csv.py
│ ├── usv-to-debug.py
│ └── usv-to-display.py
├── doc/
│ ├── abnf/
│ │ └── index.md
│ ├── clap/
│ │ └── index.md
│ ├── code/
│ │ └── index.md
│ ├── comparisons/
│ │ ├── asv/
│ │ │ └── index.md
│ │ ├── csv/
│ │ │ └── index.md
│ │ ├── index.md
│ │ ├── json/
│ │ │ └── index.md
│ │ ├── rsv/
│ │ │ └── index.md
│ │ ├── tsv/
│ │ │ └── index.md
│ │ └── xlsx/
│ │ └── index.md
│ ├── converters/
│ │ └── index.md
│ ├── criticisms/
│ │ └── index.md
│ ├── editors/
│ │ ├── emacs/
│ │ │ └── index.md
│ │ └── vi/
│ │ └── index.md
│ ├── end-of-transmission/
│ │ └── index.md
│ ├── escape/
│ │ └── index.md
│ ├── faq/
│ │ └── index.md
│ ├── history-of-ascii-separated-values/
│ │ └── index.md
│ ├── how-to-type-unicode-characters/
│ │ └── index.md
│ ├── how-to-use-split-and-regex/
│ │ └── index.md
│ ├── layout/
│ │ └── index.md
│ ├── markup/
│ │ └── index.md
│ ├── purpose/
│ │ └── index.md
│ ├── rfc/
│ │ ├── draft-unicode-separated-values-01.txt
│ │ ├── draft-unicode-separated-values-01.xml
│ │ └── index.md
│ ├── spacers/
│ │ └── index.md
│ ├── styles/
│ │ └── index.md
│ └── todo/
│ └── index.md
├── examples/
│ ├── blog-posts.csv
│ ├── blog-posts.usv
│ ├── end-of-transmission.usv
│ ├── hello-goodnight.csv
│ ├── hello-goodnight.usv
│ ├── stream.usv
│ ├── zen-koans.csv
│ └── zen-koans.usv
├── tests/
│ ├── 1-dimensional-as-line/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 1-dimensional-as-lines/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 2-dimensional-as-line/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 2-dimensional-as-lines/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 3-dimensional-as-line/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 3-dimensional-as-lines/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 4-dimensional-as-line/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── 4-dimensional-as-lines/
│ │ ├── expect.json
│ │ └── input.usv
│ ├── blog-posts/
│ │ ├── output-actual.txt
│ │ ├── output-expect.txt
│ │ └── test.sh
│ ├── end-of-transmission-block/
│ │ ├── output-actual.txt
│ │ ├── output-expect.txt
│ │ └── test.sh
│ ├── libreoffice-calc/
│ │ ├── example1.ods
│ │ └── example2.ods
│ ├── microsoft-excel/
│ │ ├── example1.xls
│ │ ├── example1.xlsx
│ │ ├── example2.xls
│ │ └── example2.xlsx
│ └── stream/
│ ├── output-actual.txt
│ ├── output-expect.txt
│ └── test.sh
└── todo.md
================================================
FILE CONTENTS
================================================
================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
[INSERT CONTACT METHOD].
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations
================================================
FILE: README.md
================================================
# Unicode Separated Values (USV) ™
Unicode Separated Values (USV) ™ is a data format that uses Unicode characters for markup.
[FAQ](doc/faq/) •
[RFC](doc/rfc/) •
[Code](doc/code/) •
[Comparisons](doc/comparisons/) •
[TODO](doc/todo/) •
[XKCD](https://xkcd.com/927/)
## Introduction
Unicode Separated Values (USV) enables new ways of working with data as plain text.
* USV builds on ASCII Separated Values (ASV) plus adds capabilities for visible markup.
* USV contrasts with Comma Separated Values (CSV) because USV is more specific and powerful.
* USV is similar in spirit to Markdown (MD) because the purpose is easy freeform text editing.
### USV markup
USV uses Unicode characters for data markup.
* <tt>[U+001F](https://codepoints.net/U+001F)/[U+241F](https://codepoints.net/U+241F)</tt> Unit Separator.
* <tt>[U+001E](https://codepoints.net/U+001E)/[U+241E](https://codepoints.net/U+241E)</tt> Record Separator.
* <tt>[U+001D](https://codepoints.net/U+001D)/[U+241D](https://codepoints.net/U+241D)</tt> Group Separator.
* <tt>[U+001C](https://codepoints.net/U+001C)/[U+241C](https://codepoints.net/U+241C)</tt> File Separator.
* <tt>[U+001B](https://codepoints.net/U+001B)/[U+241B](https://codepoints.net/U+241B)</tt> Escape.
* <tt>[U+0004](https://codepoints.net/U+0004)/[U+2404](https://codepoints.net/U+2404)</tt> End of Transmission.
### USV examples
USV looks like this for a 1-dimensional data made of units, such as a log. Each unit ends with a Unit Separator character and an optional newline character.
```usv
a␟
b␟
c␟
d␟
```
USV looks like this for 2-dimensional data made of units and records, such as a spreadsheet table. Each record ends with a Record Separator character and an optional newline character.
```usv
a␟b␟␞
c␟d␟␞
```
USV looks like this for 3-dimensional data made of units and records and groups, such as a spreadsheet folio. Each group ends with a Group Separator character and an optional newline character.
```usv
Sheet1␟␞
a␟b␟␞
c␟d␟␞
␝
Sheet2␟␞
e␟f␟␞
g␟h␟␞
␝
```
USV looks like this for 4-dimensional data made of units and records and groups and files, such as a collection of spreadsheet folios. Each file ends with a File Separator character and an optional newline character.
```usv
Folio1␟␞
Sheet1␟␞
a␟b␟␞
c␟d␟␞
␝
Sheet2␟␞
e␟f␟␞
g␟h␟␞
␝␜
Folio2␟␞
Sheet3␟␞
a␟b␟␞
c␟d␟␞
␝
Sheet4␟␞
e␟f␟␞
g␟h␟␞
␝␜
```
### USV style
USV uses style options to display marks in various ways.
* Style Symbols: use visible symbol characters such as `␟`
* Style Controls: use invisible control characters such as `\u001F`
* Style Braces: use curly-braces with abbreviations such as: `{US}`
### USV layout
USV uses layout options to format data in various ways.
* Layout Default: format the data so it looks good on a typical terminal screen.
* Layout Lines: format each mark with 0 or 1 or 2 surrounding newlines.
* Layout by Units or Records or Groups or Files: format a chunk to display on one line.
## Documentation
Core:
* [Markup with separators and modifiers](doc/markup/)
* [Style with symbols, controls, braces](doc/style/)
* [Layout with units, records, groups, files, spacers](doc/layout/)
Community:
* [Frequently Asked Questions (FAQ)](doc/faq/)
* [Criticisms and replies](doc/criticisms/)
* [TODO list](doc/todo/)
Specification:
* [Request For Comments (RFC)](doc/rfc/)
* [Augmented Backus–Naur Form (ABNF)](doc/anbf/)
Code:
* [Code examples and production crates](doc/code/)
* [Command line argument parsing](doc/clap/)
How to:
* [How to type Unicode characters](doc/how-to-type-unicode-characters/)
* [How to use split and regex](doc/how-to-use-split-and-regex/)
Context:
* [Converters for ASV, CSV, JSON, XLSX](doc/converters/)
* [Comparisons with ASV, CSV, TSV, RSV, JSON](doc/comparisons/)
* [History of ASCII separated values (ASV)](history-of-ascii-separated-values/)
Editor notes:
* [vim notes](doc/editors/vi/)
* [emacs notes](doc/editors/emacs/)
Example files:
* [hello-world.usv](examples/hello-world.usv) versus [hello-world.csv](examples/hello-world.csv)
* [zen-koans.usv](examples/zen-koans.usv) versus [zen-koans.csv](examples/zen-koans.csv)
* [blog-posts.usv](examples/blog-posts.usv) versus [blog-posts.csv](examples/blog-posts.csv)
* [end-of-transmission.usv](examples/end-of-transmission.usv)
## Hello World
Suppose you want USV text with two units: "hello" and "world".
The USV text with USV symbol characters for unit separators:
```usv
hello␟world␟
```
The USV text with USV control characters for unit separators:
```usv
hello\u001Fworld\u001F
```
## Comparisons to spreadsheets and databases
USV semantics are units, records, groups, files.
Spreadsheet semantics are cells, lines, sheets, folios.
Databases semantics are fields, rows, tables, schemas.
## Examples
USV with 2 units by 2 records by 2 groups by 2 files, and the style as sheets:
```usv
a␟b␟␞
c␟d␟␞
␝
e␟f␟␞
g␟h␟␞
␝
␜
i␟j␟␞
k␟l␟␞
␝
m␟n␟␞
o␟p␟␞
␝
␜
```
Parsing example with the USV Rust crate and its iterators:
```rust
use usv::*;
let text = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜";
let files = text.files();
for file in files {
for group in file {
for record in group {
for unit in record {
println!(&unit);
}
}
}
}
```
## Why use USV?
USV can handle data that contains commas, semicolons, quotes, tabs, newlines, and other special characters, all without escaping.
USV can format units/columns/cells and records/rows/lines and groups/tables/grids and files/schemas/folios.
USV aims to be an international standard, and has an official IETF RFCXML Internet Draft.
USV uses Unicode characters that are semantically meaningful.
USV works well with any typical modern editor, font, terminal, shell, search, and language.
USV uses visible letter-width characters, and these are easy to view, select, copy, paste, search.
## USV is easy and friendly
USV is intended to be easy to use and friendly to try.
USV works with many kinds of data, and many kinds of editors. Any editor that can render the USV characters will work. We use vim, emacs, helix, Zed, VS Code, JEOTrains IDEs, Nova, TextMate, Sublime, Notepad++, etc.
USV works with many kinds of tools. Any tool that can parse the USV characters will work. We use awk, sed, grep, rg, miller, etc.
USV works with many kinds of languages. Any language that can handle UTF-8 character encoding and rendering should work. We use C, C++, C#, Elixir, Erlang, Go, Java, JavaScript, Julia, Kotlin, Perl, PHP, Python, R, Ruby, Rust, Swift, TypeScript, etc.
## Legal protection for standardization
The USV project aims to become a free open source IETF standard and IANA standard, much like the standards for CSV and TDF.
Until the standardization happens, the terms "Unicode Separated Values" and "USV" are both trademarks of this project. This repository is copyright 2022-2024. The trademarks and copyrights are by Joel Parker Henderson, me, an individual, not a company.
When IETF and IANA approve the submissions as a standard, then the trademarks and copyright will go to a free libre open source software advocacy foundation. We welcome advice about how to do this well.
## Conclusion
USV is helping us with data projects. We hope USV may help you too.
We welcome constructive feedback about USV, as well as git issues, pull requests, and standardization help.
[FAQ](doc/faq/) •
[RFC](doc/rfc/) •
[Code](doc/code/) •
[Comparisons](doc/comparisons/) •
[TODO](doc/todo/) •
[XKCD](https://xkcd.com/927/)
================================================
FILE: bin/bash/usv-to-csv.bash
================================================
#!/usr/bin/env bash
set -euf -o pipefail
# USV example shell script that converts USV to CSV.
#
# Note this script is a simple demo, and does not attempt to escape CSV output,
# such as create a double-quoted unit to protect an embedded comma or newline.
escape=false
comma=''
while IFS= read -N1 -r c; do
if [ "$escape" = true ]; then
escape=false
printf %s "$c"
else
case "$c" in
"\u001B" | "␛")
escape=true
;;
"\u001F" | "␟")
comma=','
;;
"\u001E" | "␞")
printf "\n"
comma=''
;;
"\u001D" | "␝")
>&2 printf "\nerror: group separator\n"
;;
"\u001C" | "␜")
>&2 printf "\nerror: file separator\n"
;;
"\u0004" | "␄")
break
;;
*)
printf %s%s "$comma" "$c"
comma=''
;;
esac
fi
done
================================================
FILE: bin/bash/usv-to-debug.bash
================================================
#!/usr/bin/env bash
set -euf -o pipefail
# USV example shell script that demonstrates the use of USV characters.
# This script reads STDIN one character at a time, and prints text.
escape=false
while IFS= read -N1 -r c; do
if [ "$escape" = true ]; then
escape=false
printf %s "\nescape character: " "$c"
else
case "$c" in
"\u001B" | "␛")
printf "\nescape\n"
escape=true
;;
"\u001F" | "␟")
printf "\nunit separator\n"
;;
"\u001E" | "␞")
printf "\nrecord separator\n"
;;
"\u001D" | "␝")
printf "\ngroup separator\n"
;;
"\u001C" | "␜")
printf "\nfile separator\n"
;;
"\u0004" | "␄")
printf "\nend of transmission\n"
break
;;
*)
printf %s "$c"
;;
esac
fi
done
printf "\n"
================================================
FILE: bin/bash/usv-to-display.bash
================================================
#!/usr/bin/env bash
set -euf -o pipefail
# USV example shell script that demonstrates the use of USV characters.
# This script reads STDIN one character at a time, and prints text.
escape=false
while IFS= read -N1 -r c; do
if [ "$escape" = true ]; then
escape=false
printf %s "$c"
else
case "$c" in
"\u001B" | "␛")
escape=true
;;
"\u001F" | "␟")
printf ","
;;
"\u001E" | "␞")
printf "\n"
;;
"\u001D" | "␝")
printf "\n-\n"
;;
"\u001C" | "␜")
printf "\n=\n"
;;
"\u0004" | "␄")
break
;;
*)
printf %s "$c"
;;
esac
fi
done
printf "\n"
================================================
FILE: bin/python/usv-to-csv.py
================================================
#!/usr/bin/env python3
# USV example shell script that converts USV to CSV.
#
# Note this script is a simple demo, and does not attempt to escape CSV output,
# such as create a double-quoted unit to protect an embedded comma or newline.
import io
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
escape = False
comma = ''
while True:
c = sys.stdin.read(1)
if c == '':
break
if escape:
escape = False
print(f"{c}", end='', flush=True)
else:
match c:
case "\u001B" | "␛":
escape = True
case "\u001F" | "␟":
comma=','
case "\u001E" | "␞":
print(f"\n", end='', flush=True)
comma = ''
case "\u001D" | "␝":
raise Exception("error: group separator")
case "\u001C" | "␜":
raise Exception("error: file separator")
case "\u0004" | "␄":
break
case (c):
print(f"{comma}{c}", end='', flush=True)
comma = ''
================================================
FILE: bin/python/usv-to-debug.py
================================================
#!/usr/bin/env python3
# USV example script that demonstrates the use of USV characters.
# This script reads STDIN one character at a time, and prints text.
import io
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
escape = False
while True:
c = sys.stdin.read(1)
if c == '':
break
if escape:
escape = False
print(f"\nescape character: {c}\n", end='', flush=True)
else:
match c:
case "\u001B" | "␛":
print("\nescape\n", end='', flush=True)
escape = True
case "\u001F" | "␟":
print(f"\nunit separator\n", end='', flush=True)
case "\u001E" | "␞":
print(f"\nrecord separator\n", end='', flush=True)
case "\u001D" | "␝":
print(f"\ngroup separator\n", end='', flush=True)
case "\u001C" | "␜":
print(f"\nfile separator\n", end='', flush=True)
case "\u0004" | "␄":
print(f"\nend of transmission\n", end='', flush=True)
break
case (c):
print(f"{c}", end='', flush=True)
print()
================================================
FILE: bin/python/usv-to-display.py
================================================
#!/usr/bin/env python3
# USV example script that demonstrates the use of USV characters.
# This script reads STDIN one character at a time, and prints text.
import io
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
escape = False
while True:
c = sys.stdin.read(1)
if c == '':
break
if escape:
escape = False
print(f"{c}", end='', flush=True)
else:
match c:
case "\u001B" | "␛":
escape = True
case "\u001F" | "␟":
print(f",", end='', flush=True)
case "\u001E" | "␞":
print(f"\n", end='', flush=True)
case "\u001D" | "␝":
print(f"\n-\n", end='', flush=True)
case "\u001C" | "␜":
print(f"\n=\n", end='', flush=True)
case "\u0004" | "␄":
break
case (c):
print(f"{c}", end='', flush=True)
print()
================================================
FILE: doc/abnf/index.md
================================================
# Augmented Backus–Naur Form (ABNF)
Augmented Backus–Naur Form (ABNF) grammar-- work in progress.
## Semantics
* usv = *files
* file = *groups
* group = *records
* record = *units
* unit = *content-characters
## Syntax
Sections:
* usv = ( header-and-body / body ) '*' ; anything after the body is chaff
* header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-run
* body = *unit-run / *record-run / *group-run / *file-run
Runs:
* file-run = *( *spacer-character file *spacer-character FS )
* group-run = *( *spacer-character group *spacer-character GS )
* record-run = *( *spacer-character record *spacer-character RS )
* unit-run = *( *spacer-character unit *spacer-character US )
Character classes:
* content-character = typical-character / escape-character
* typical-character = '*' - special-character - escape-character
* special-character = US / RS / GS / FS / ESC / EOT
* escape-character = ESC ( special-character / typical-character )
* spacer-character = Defined by Unicode Derived Core Property White_Space
## Unicode characters
Markers:
* US = U+001F Unit Separator / U+241F Symbol for Unit Separator
* RS = U+001E Record Separator / U+241E Symbol for Record Separator
* GS = U+001D Group Separator / U+241D Symbol for Group Separator
* FS = U+001C File Separator / U+241C Symbol for File Separator
Modifiers:
* ESC = U+001B Escape / U+241B Symbol for Escape
* EOT = U+0004 End Of Transmission / U+2404 Symbol for End Of Transmission
================================================
FILE: doc/clap/index.md
================================================
# Command line argument parsing (CLAP)
USV tools should enable users to choose their preferred output style.
USV tools for terminals should enable options with these settings.
Options for USV separators and modifiers:
* -u, --unit-separator : Set the unit separator string.
* -r, --record-separator : Set the record separator string.
* -g, --group-separator : Set the group separator string.
* -f, --file-separator : Set the file separator string.
* -e, --escape : Set the escape string.
* -z, --end-of-transmission : Set the end-of-transmission string.
Options for USV marks:
* --style-symbols : Show marks as symbols, such as "␟" for Unit Separator.
* --style-controls : Show marks as controls, such as "\u001F" for Unit Separator. This is most like ASCII Separated Values (ASV).
* --style-braces : Show marks as braces, such as "{US}" for Unit Separator. This is to help plain text readers, and is not USV output.
Options for USV layout:
* --layout-0: Show each item with no line around it. This is no layout, in other words one long line.
* --layout-1: Show each item with one line around it. This is like single-space lines for long form text.
* --layout-2: Show each item with two lines around it. This is like double-space lines for long form text.
* --layout-units: Show each unit on one line. This can be helpful for line-oriented tools.
* --layout-records: Show each record on one line. This is like a typical spreadsheet sheet export.
* --layout-groups: Show each group on one line. This can be helpful for folio-oriented tools.
* --layout-files: Show one file on one line. This can be helpful for archive-oriented tools.
Options for command line tools:
* -h, --help : Print help
* -V, --version : Print version
* -v, --verbose... : Set the verbosity level: 0=none, 1=error, 2=warn, 3=info, 4=debug, 5=trace. Example: --verbose …
* --test : Print test output for debugging, verifying, tracing, and the like. Example: --test
================================================
FILE: doc/code/index.md
================================================
# Code
USV has source code examples and also has production-ready library code.
## Script examples with Bash and python
This repository includes USV code examples that demonstrate parsing.
Bash examples:
* [usv-to-display.bash](../../bin/bash/usv-to-display.bash)
* [usv-to-debug.bash](../../bin/bash/usv-to-debug.bash)
* [usv-to-csv.bash](../../bin/bash/usv-to-csv.bash)
Python examples:
* [usv-to-display.py](../../bin/python/usv-to-display.py)
* [usv-to-debug.py](../../bin/python/usv-to-debug.py)
* [usv-to-csv.py](../../bin/python/usv-to-csv.py)
## Production code with Rust
Rust has a crate in its own repo suitable for production use:
* `cargo install usv`
* [https://crates.io/crate/usv](https://crates.io/crate/usv)
* [https://github.com/sixarm/usv-rust-crate](https://github.com/sixarm/usv-rust-crate)
Command line converters:
* [asv-to-usv](https://crates.io/crate/asv-to-usv) and [usv-to-asv](https://crates.io/crate/usv-to-asv)
* [csv-to-usv](https://crates.io/crate/csv-to-usv) and [usv-to-csv](https://crates.io/crate/usv-to-csv)
* [json-to-usv](https://crates.io/crate/json-to-usv) and [usv-to-json](https://crates.io/crate/usv-to-json)
The Rust code includes tests and benchmarks. We welcome improvements.
================================================
FILE: doc/comparisons/asv/index.md
================================================
# ASCII Separated Values (ASV) a.k.a. DEL (Delimited ASCII)
ASCII Separated Values (ASV) uses these invisible zero-width control character separators:
* ASCII character 28 as file separator
* ASCII character 29 as group separator
* ASCII character 30 as record separator
* ASCII character 31 as unit separator.
These separators are identical in concept as in USV.
ASV also:
* Forbids the ASCII control characters in content. In other words, there is no escaping.
* In practice, has many incompatible implementations and users that expect the record separator to be a newline character, because the implementations and users prefer to display the data on a screen.
## In our experience
In our experience, these ASCII characters tend to be hard to edit manually.
* Because many editors treat the characters as invisible zero-width characters.
* Because major character pickers show the visible character then insert the visible character, which is the corresponding USV Symbol.
In our experience, > 90% of the ASV files we discovered in our research used the character "\n" as the record delimiter, or the combination of characters "\r\n", rather than the correct character 30.
================================================
FILE: doc/comparisons/csv/index.md
================================================
# Comma Separated Values (CSV)
Comma Separated Values (CSV) uses a comma character to separate values, and a newline character to separate records.
* Has fields, which are equivalent to USV units.
* Has records, which are equivalent to USV records.
* Does not have a greater hierarchy, such as USV groups and fields, or spreadsheet sheets and folios, or database tables or schemas, etc.
* Forbids the tab character in content.
* Forbids the newline character in content.
* Some implementations forbid the comma character in content; other implementations allow it if and only if the field is surrounded by quotation marks.
* Some implementations forbid the newline character in content; other implementations allow it if and only if the field is surrounded by quotation marks.
## Custom delimiter character
Some CSV implementations and users enable a custom delimiter character.
* For example, some users prefer to use the semicolon character. This is prevalent among some European regions, where the comma character is frequently in use within numbers as a digit separator, such as "123,456,789".
* For example, some users prefer to use the vertical pipe character. This is prevalent among some developers of natural language content, when the developers are aware that content may contain commas or semicolons, yet is unlikely to contain a pipe character.
There is no standardization to know what the delimiter character is, ahead of time.
* In practice, some CSV implementations use a heuristic to guess the delimiter character by inspecting the data.
* In practice, some CSV users send along out-of-band instructions that explain the delimiter character.
### Commas
CSV implementations may fail when there is a comma that is supposed to be in content, or may require quoting:
This data is typically parsed as two CSV fields:
```csv
hello, world
```
To get the data as one field, some CSV implementations support surrounding quotation marks:
```csv
"hello, world"
```
USV honors commas, such as in this one unit that contains a comma:
```usv
hello, world
```
### Quotes
CSV implementations may fail when there is a quotation mark that is supposed to be in content, or may require implementation-specific triple double-quotes.
This data is typically parsed as a CSV error:
```csv
I say "hello, world"
```
To get the data as one field, some CSV implementations support surrounding quotation marks and escaping via double double-quotes:
```csv
"I say ""hello, world"""
```
USV honors quotes, such as in this one unit that contains quotation marks:
```usv
I say "hello, world"
```
### Newlines
CSV implementations may fail when there is a newline that is supposed to be in content, or may require implementation-specific escaping.
This data is typically parsed as a CSV error:
```csv
"first line\nsecond line"
```
To get the data as one field, some CSV implementations support escaping by using backslash quotation marks like this:
```csv
"\"first line\rsecond line\""
```
USV honors newlines, such as in this one unit that contains a newline:
```usv
first line
second line
```
## In our experience
In our experience, the CSV format has various kinds of implementations, some incompatible, some with escaping and some without.
In our experience, some software programs use the file name extension ".csv" to mean other ways of separating data with other characters, such as using tabs, or semi-colons, or spaces.
### CSV files
We work with spreadsheets that are folios, that each contain sheets, that each contain grids.
Suppose we work with 3 spreadsheets, and each spreadsheet contains 3 sheets. When we export the data, the export process needs multiple filesystem files, and needs some kind of ad hoc naming convention to show what's what:
```txt
my-folio-1-sheet-1.csv
my-folio-1-sheet-2.csv
my-folio-1-sheet-3.csv
my-folio-2-sheet-1.csv
my-folio-2-sheet-2.csv
my-folio-2-sheet-3.csv
my-folio-3-sheet-1.csv
my-folio-3-sheet-2.csv
my-folio-3-sheet-3.csv
```
To send all the data to another team, we have tried a variety of combiner tools, such as `tar` and `zip`.
For comparison, USV can contain all the data, because a USV file is equivalent to a spreadsheet folio, and USV group is equivalent to a spreadsheet sheet.
Thus our export uses one filesystem file:
```txt
my.usv
```
================================================
FILE: doc/comparisons/index.md
================================================
# Comparisons with ASV, CSV, TSV, RSV
Unicode separated values (USV) is similar to these formats, plus offers more capabilities, editor-friendly markup, and standards-track syntax.
* [ASCII separated values (ASV) a.k.a. DEL (Delimited ASCII)](asv)
* [Comma Separated Values (CSV)](csv)
* [Tab Separated Values (TSV) a.k.a. Tab Delimited Format (TDF)](tsv)
* [Rows of String Values (RSV)](rsv)
* [JavaScript Object Notation (JSON)](json)
* [Microsoft Excel (XLSX)](xlsx)
## Summary table
| Capability | [USV](../../) | [ASV](asv) | [CSV](csv) | [TSV](tsv) | [RSV](rsv) | [JSON](json) | [XLSX](xlsx) |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Units / cells / fields | ✅ | ✅ | ✅ | ✅ | ✅ | 🟡 | ✅ |
| Records / lines / rows | ✅ | ✅ | ✅ | ✅ | ✅ | 🟡 | ✅ |
| Groups / sheets / tables | ✅ | ✅ | ⛔ | ⛔ | ⛔ | 🟡 | ✅ |
| Files / folios / schemas | ✅ | ✅ | ⛔ | ⛔ | ⛔ | 🟡 | ✅ |
| Text, not binary | ✅ | ✅ | ✅ | ✅ | ⛔ | ✅ | ⛔ |
| All visible separators | ✅ | ⛔ | ✅ | 🟡 | ⛔ | ✅ | ⛔ |
| Easy for any text editor | ✅ | ⛔ | ✅ | ✅ | ⛔ | ⛔ | ⛔ |
| Separator line spacing | ✅ | ⛔ | 🟡 | 🟡 | ⛔ | 🟡 | ⛔ |
| IETF.org standards-track | ✅ | ⛔ | 🟡 | 🟡 | ⛔ | ✅ | 🟡 |
| Escaping | ✅ | ✅ | ✅ | ⛔ | ⛔ | 🟡 | 🟡 |
| End of Transmission | ✅ | ✅ | ⛔ | ⛔ | ⛔ | ⛔ | ⛔ |
| Variable units per record | ✅ | ⛔ | ⛔ | ⛔ | ✅ | ✅ | ⛔ |
| Separators are terminators | ✅ | ⛔ | ⛔ | ⛔ | ✅ | ⛔ | ⛔ |
| Unicode UTF-8 default | ✅ | ⛔ | ⛔ | ⛔ | ⛔ | ✅ | 🟡 |
## Example for ASCII Separated Values (ASV)
```asv
a\u001FB\u001F\u001Ec\u001FD\u001F\u001E
```
USV with symbols:
```usv
a␟b␟␞c␟d␟␞
```
USV with controls is identical to ASV:
```usv
a\u001FB\u001F\u001Ec\u001FD\u001F\u001E
```
## Example for Comma Separated Values (CSV)
CSV example:
```xlsx
a,b
c,d
```
USV with symbols:
```usv
a␟b␟␞
c␟d␟␞
```
USV with controls:
```usv
a\u001FB\u001F\u001E
c\u001FD\u001F\u001E
```
## Example for Tab Separated Values (TSV)
TSV example:
```xlsx
a b
c d
```
USV with symbols:
```usv
a␟b␟␞
c␟d␟␞
```
USV with controls:
```usv
a\u001FB\u001F\u001E
c\u001FD\u001F\u001E
```
## Example for Rows of String Values (RSV)
RSV example:
```rsv
a\b255b\b255\b253c\b255d\b255\b253
```
USV with symbols:
```usv
a␟b␟␞
c␟d␟␞
```
USV with controls:
```usv
a\u001FB\u001F\u001E
c\u001FD\u001F\u001E
```
## Example for Microsoft Excel (XLSX)
XLSX example:
```xlsx
Sheet 1
a,b
c,d
Sheet 2
d,e
f,g
```
USV with symbols:
```usv
Sheet 1␟␞
a␟b␟␞
c␟d␟␞
␝
Sheet 2␟␞
e␟f␟␞
g␟h␟␞
␝
```
USV with controls:
```usv
Sheet 1\u001F\u001E
a\u001FB\u001F\u001E
c\u001FD\u001F\u001E
\u001D
Sheet 2\u001F\u001E
e\u001Ff\u001F\u001E
g\u001Fh\u001F\u001E
\u001D
```
================================================
FILE: doc/comparisons/json/index.md
================================================
# JavaScript Object Notation (JSON)
JavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers. - Wikipedia ([Source](https://en.wikipedia.org/wiki/JSON))
JSON is more flexible and more powerful than USV because JSON can have infinite nesting and also data types.
Example JSON:
```json
[
["a","b"],
["d","e"]
]
```
Equivalent USV:
```usv
a␟b␟␞
c␟d␟␞
```
## In our experience
We use JSON in many web applications, API endpoints, data transformations, and the like. It works very well for these purposes.
In our experience JSON is harder to edit by hand than USV, and harder to teach to novices who want to view and edit data. USV tends to be easier for these use cases because USV is simpler.
================================================
FILE: doc/comparisons/rsv/index.md
================================================
# Rows of String Values (RSV)
https://github.com/Stenway/RSV-Specification
The RSV data file format is a simple binary alternative to CSV.
An RSV document represents an array of arrays of nullable string values, also called a jagged array.
It's main purpose is to store tabular data. But because it's a jagged array, it's not limited to that. So, rows can contain the same number of values, but don't have to.
================================================
FILE: doc/comparisons/tsv/index.md
================================================
# Tab Separated Values (TSV) a.k.a. Tab Delimited Format (TDF)
Tab Separated Values (TSV) uses a tab character to separate values, and a newline character to separate records.
* Has fields, which are equivalent to USV units.
* Has records, which are equivalent to USV records.
* Does not have a greater hierarchy, such as USV groups and fields, or spreadsheet sheets and folios, or database tables or schemas, etc.
* Forbids the tab character in content.
* Forbids the newline character in content.
## In our experience
In our experience, TSV can be difficult to edit with some editors, because the tab character can be invisible, or can take up a varying number of character widths such as 2 spaces or 4 spaces or 8 spaces or as many spaces as it takes to get to the next tab stop.
In our experience, some software programs use the file name extension ".tsv", others use the extension ".tdf", and others use the extension ".csv" even though the file actually uses tabs and doesn't use commas.
================================================
FILE: doc/comparisons/xlsx/index.md
================================================
# Microsoft Excel (XLSX)
Microsoft Excel (XLSX) is among the world's most popular spreadsheet programs. It uses a data format called "XLSX" which in turn uses XML and binary compression.
* Has spreadsheet sheets. Each sheet is called a "Worksheet", and can contain columns and rows.
* Has spreadsheet folios. Each folio is called a "Workbook", and can contain one or more sheets.
* Does not have a greater hierarchy, such as a collection of folios.
* Can import/export data in many formats, such as CSV and TSV, but not yet USV.
## Custom delimiters
Microsoft Excel enables the user to import/export using a wide range of custom delimiters, such as column separators and row separators.
## In our experience
In our experience, the XLSX is great for primarily reading and editing by using Microsoft Excel or a compatible spreadsheet program. We had some success using decompression software then a XML editor, but this process and the XML tooling is harder for end users to do.
### Workbooks and Worksheets
We work with spreadsheets that are folios a.k.a. workbooks, that each contain multiple sheets a.k.a. worksheets.
```txt
my-workbook-1.xlsx
my-workbook-2.xlsx
my-workbook-3.xlsx
```
Or if we export data to CSV or similar format then we have even more files:
```txt
my-workbook-1-worksheet-1.csv
my-workbook-1-worksheet-2.csv
my-workbook-1-worksheet-3.csv
my-workbook-2-worksheet-1.csv
my-workbook-2-worksheet-2.csv
my-workbook-2-worksheet-3.csv
my-workbook-3-worksheet-1.csv
my-workbook-3-worksheet-2.csv
my-workbook-3-worksheet-3.csv
```
To send all the data to another team, we have tried a variety of combiner tools, such as `tar` and `zip`.
For comparison, USV can contain all the data, because a USV file is equivalent to a spreadsheet folio, and USV group is equivalent to a spreadsheet sheet.
Thus our export uses one filesystem file:
```txt
my.usv
```
================================================
FILE: doc/converters/index.md
================================================
# Converters for ASV, CSV, JSON, XSLX
ASCII Separated Values (ASV):
* [asv-to-usv](https://crates.io/crate/asv-to-usv)
* [usv-to-asv](https://crates.io/crate/usv-to-asv)
Comma Separated Values (CSV):
* [csv-to-usv](https://crates.io/crate/csv-to-usv)
* [usv-to-csv](https://crates.io/crate/usv-to-csv)
JavaScript Object Notation (JSON):
* [json-to-usv](https://crates.io/crate/json-to-usv)
* [usv-to-json](https://crates.io/crate/usv-to-json)
Microsoft Excel XML (XLSX):
* [xlsx-to-usv](https://crates.io/crate/xlsx-to-usv)
* [usv-to-xlsx](https://crates.io/crate/usv-to-xlsx)
================================================
FILE: doc/criticisms/index.md
================================================
# Criticisms
USV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).
Constructive feedback is welcome. See also [frequently asked questions](../faq/).
- [XKCD one universal standard](#xkcd-one-universal-standard)
- [Fundamentally wrong](#fundamentally-wrong)
- [You cannot edit it](#you-cannot-edit-it)
- [No efficient storage](#no-efficient-storage)
- [There is no wide library support](#there-is-no-wide-library-support)
- [Not all data is representable](#not-all-data-is-representable)
- [Editors work with invisible characters](#editors-work-with-invisible-characters)
- [Doesn't work with Excel](#doesnt-work-with-excel)
- [Not trivially splittable](#not-trivially-splittable)
- [No need for an escape character](#no-need-for-an-escape-character)
- [Can't encode as a single byte](#cant-encode-as-a-single-byte)
- [Better off advocating for editor support](#better-off-advocating-for-editor-support)
- [Cleverness for cleverness’s sake](#cleverness-for-clevernesss-sake)
- [This is kinda stupid](#this-is-kinda-stupid)
- [Nobody needs USV, and nobody should use it.](#nobody-needs-usv-and-nobody-should-use-it)
- [Kill it with fire](#kill-it-with-fire)
## XKCD one universal standard
<blockquote>
"This is like the <a href=https://xkcd.com/927/>XKCD cartoon</a> about one universal standard."
</blockquote>
Ha! That's funny. It turns out USV isn't trying to be one universal standard. CSV works really well for many use cases, and is well-supported everywhere, so by all means keep using CSV where you want and where it works well.
USV aims just for use cases that CSV doesn't seem to handle well, such as text that contains paragraphs of natural language, or displays better with newlines between units, or data that involves spreadsheet collections (e.g. folios comprising sheets comprising rows and columns) and database collections (e.g. schemas comprising tables comprising records and fields), or data that needs an End of Transmission.
## Fundamentally wrong
<blockquote>
"Using Unicode graphic characters as metasyntactic escape characters is fundamentally wrong. Those Unicode characters are for displaying the symbols for Unit Separator, Record Separator, etc. and not for actually being separators! ASCII already has those! Included in Unicode!"
</blockquote>
USV accepts ASCII control characters and the corresponding Unicode symbol characters as equivalent.
If you prefer to use exclusively ASCII control characters, then do that. I tried that approach first, and the ASCII control characters didn't work well in practice for visual display and for text editors. This is because the ASCII control characters are rendered as invisible for many of the displays and editors I tried, and also didn't copy correctly in many of the tools.
Also, there are command-line tools for converting from ASCII Separated Values (ASV) to Unicode Separated Values (USV) and vice versa: [asv-to-usv](https://crates.io/crates/asv-to-usv), [usv-to-asv](https://crates.io/crates/asv-to-usv).
## You cannot edit it
<blockquote>
"You cannot edit it in regular editor, like csv/tsv/jsonlines."
</blockquote>
I edit it in regular editors, every day. I use vi, emacs, VS Code, JEOTrains IDEs, and more. I've also tried USV on many more editors, and so far it works 100% of the time. If you have a specific editor that doesn't seem to be working well with USV, can you please contact me?
## No efficient storage
<blockquote>
"There is no efficient storage, like binary formats."
</blockquote>
USV is a text format, on purpose, because it's aiming to be human-readable and human-editable. USV storage goals are similar in magnitude to CSV.
If you want efficient storage like a binary format, one way is to use compression on the text data. USV, CSV. and similar text formats can work well with compression, especially if the content has compression-friendly aspects such as repetitions, sequences, patterns, and so forth.
## There is no wide library support
<blockquote>
"There is no wide library support."
</blockquote>
Currently there's library support using the [USV Rust crate](https://crates.io/crates/usv) and there are command line [converters](../converters/).
I welcome help creating library support from anyone who wants to help. The Rust crate is relatively easy to understand, and should be portable to similar family languages such as C, C++, C#, Java, JavaScript, Python, Ruby, etc.
## Not all data is representable
<blockquote>
"Not all data is representable."
</blockquote>
Can you provide an example of data that is not representable, or an explanation of what the data could be?
USV aims for all data to be representable. Specifically, USV aims to be able to represent all UTF-8 encoded text. USV provides an escape character, so you can escape any of the USV special characters as you wish.
## Editors work with invisible characters
<blockquote>
"We already have editors that can work with invisible characters. It’s not hard."
</blockquote>
It turns out it is hard, in practice. I tried using invisible characters first, and found ongoing hard problems such as with copy/paste, search/replace, import/export, pattern matching, font display, and zero-width rendering.
In fact, the difficulties with invisible characters seems to be the reason the reason that programmers mostly abandoned ASCII Separated Values (ASV) in favor of Comma Separated Values (CSV). USV aims to build on ASV to add capabilities for visible characters and better visible displays.
## Doesn't work with Excel
<blockquote>
"The adoptability challenge remains here to be Excel support."
</blockquote>
Yes you're right. USV is brand-new on the standards track in 2024. Excel support is a long-term goal. Submitting to the IETF is to help programs like Excel to start supporting it.
If you have experience with writing Excel import/export capabilities, I welcome your help.
## Not trivially splittable
<blockquote>
"This format is not trivially splittable with a regular expression. I'd avoid most of the escaping they show, especially for line endings, and just make RS '\n' the record separator, or possibly RS '\n'*."
</blockquote>
See the documentation about [how to use split and regex](../how-to-use-split-and-regex/).
Broadly speaking, USV does not have a goal to be trivially splittable, because visual editing is much more important in practice, and because library parsing is more more reliable.
ASCII Separated Values (ASV) should be trivially splittable by using a unit separator byte character and record separator byte character. But it turns out that many ASV files in the wild actually change from using the record separator byte character to a newline character. Before you split, you need to know these choices.
Comma Separated Values (CSV) should be trivially splittable by using a comma byte character and newline byte character. But it turns out that many CSV files in the wild actually change from using the comma byte character to a semicolon byte character or a pipe character. And some CSV files use escaping such as for quotes, or commas that are embedded in content, or escaped newlines that are embedded in content. Before you split, you need to know these choices. It's easy if you handle all data yourself; it's not easy if you're working with many worldwide organizations.
## No need for an escape character
<blockquote>
"I am not convinced about the need for an escape character."
</blockquote>
I tried USV without an escape character for a year to get real-world feedback. The feedback was that the escape was needed, because otherwise there could be data that couldn't be represented without an extra out-of-band reformatting/rewriting step.
## Can't encode as a single byte
<blockquote>
"ASCII Separated Values is better because it can encode each separator as a single byte."
</blockquote>
If single byte encoding is very important, and you don't care about visible symbols, then yes ASCII Separated Values (ASV) is better for you. USV doesn't have a goal of single byte separators.
You can freely convert between ASV and USV and back again, if you like, by using these [converters](../converters/)
## Better off advocating for editor support
<blockquote>
"Just because a glyph is "invisible" doesn't mean it has to actually be invisible. The symbols for the separators are hard to read, like you're pointing out, which means someone would eventually replace them with some other graphical display, in which case you were just as well off with the actual separators themselves. They would have been better off advocating for editor support for actual separator display."
</blockquote>
Yes you're correct. Programmers have been advocating for editor support for actual separator display since the 1980's ASCII Separated Values.
So far, the advocating has not succeeded. USV is a compromise for the present.
If the future offers editor support as you describe, then it will be great to use that instead of USV, and in fact USV will have been very useful for getting people using group separators, file separators, escapes, End of Transmissions, and other ASV features that are more extensive than CSV.
## Cleverness for cleverness’s sake
<blockquote>
"USV would have the disadvantage of using multi-byte characters as delimiters, so you have to decode the file in order to separate records. And you still can’t type the characters directly or be guaranteed to display them without font support. This honestly seems like cleverness for cleverness’s sake."
</blockquote>
Yes you're correct directionally on your technical points. To decode one record, you have to read that one record until you reach its record separator; in other words, you can't just use split on one byte value as you can with CSV. That said, you can decode one unit at a time, or one record at a time, or one group at a time, or one file at a time; you don't have to decode the whole file.
As for cleverness, it's not especially clever. USV is essentially just ASCII Separated Values (ASV) plus visible symbols and some simple extras for escape, end of transmission, and spacers. The core ideas of ASV and USV are all from the 1970's.
## This is kinda stupid
<blockquote>
"I've long wanted a successor to CSV, but this is kinda stupid. People like CSVs because they look good, feel natural even in plaintext. This is the same reason that Markdown in successful. As for including commas in your data, it could just have been managed with a simple escape character like a \, for when there's actually a comma in your data. That's it."
</blockquote>
If you want a successor to CSV, do you have suggestions for what you want?
What I learned is that when you escape with a backslash, then you have to also provide for escaping a backslash, such as two in a row, and then it causes issues for use cases such as Windows paths, regular expressions, backslash as used in a typical backslash-t for tab or backslash-n for newline, and so on. This is why I prefer to use the escape character as U+241B Symbol for Escape (ESC).
More broadly, CSV handles units and records (such as one spreadsheet sheet), but not groups (such as multiple spreadsheet sheets) or files (such as multiple spreadsheet folios). USV handles all of these.
## Nobody needs USV, and nobody should use it.
<blockquote>
"This is needlessly adding yet another standard to the mix. If you are in a position to choose what standard you use, just use:
* Whatever is best for the data model and/or languages you use. JSON is a common modern choice, suitable for most things.
* If you want something more tabular, closer to CSV (which is a valid choice for bulk data), use strict RFC 4180 compliant data.
* If you want to specify your own binary super-compact data, use ASN.1. I am also given to understand that Protobuf is a popular modern choice.
If you aren’t in a position to choose your standards, just do whatever you need to do to parse whatever junk you are given, and emit as standards-compliant data as possible as output.
* Again, RFC 4180 is a great way to standardize your own CSV output, as long as you stick to a subset which the receiving party can parse.
Nobody needs USV, and nobody should use it."
</blockquote>
Thanks for your specific feedback and conclusion. :-)
For me, what's best for my data model is text (not binary), that handles many human languages using UTF-8 (not ASCII), that is easy to read and edit in many text editors (not a specialized row-column editor), and that works especially well with content that is paragraphs of natural language with commas, quotes, newlines, indentations, and the like. I also want capabilities for groups (such as spreadsheet sheets) and files (such as spreadsheet folios).
For comparison I've tried binary formats (e.g. ASN.1, Protobuf), row-column tabular formats (e.g. CSV, TDF), web data formats (e.g. JSON, YAML), web markup formats (e.g. HTML, XML). For me, USV is significantly easier to use, read, edit, and share.
## Kill it with fire
<blockquote>
"Y'know, I greatly dislike this. It's an actual emotional reaction. This should not be standardized. No one should use this. This is a bad idea and deserves to die in obscurity.
I'll tell you why, it's pretty simple. The characters this... thing is stealing, exist to represent invisible control sequences. That is their use. The fact that they can be mentioned by direct input is inevitable, but not to be encouraged.
I will be greatly disappointed if this is accepted as a standard. The fact that a USV file looks like a rendered ASV file is a show stopping bug, an anti-feature, an insult to life itself. Kill it with fire."
</blockquote>
That's great feedback! The previous time that I heard that kind of feedback, it was about emoji being terrible and how no one should use them. Luckily representations evolve. 😀
================================================
FILE: doc/editors/emacs/index.md
================================================
# Emacs notes
C-x = shows a summary about the character at point.
C-u C-x = shows details about the character at point.
The rest of this page is from the emacs manual:
https://www.gnu.org/software/emacs/manual/html_node/emacs/International-Chars.html
## 23.1 Introduction to International Character Sets
The users of international character sets and scripts have established many more-or-less standard coding systems for storing files. These coding systems are typically multibyte, meaning that sequences of two or more bytes are used to represent individual non-ASCII characters.
Internally, Emacs uses its own multibyte character encoding, which is a superset of the Unicode standard. This internal encoding allows characters from almost every known script to be intermixed in a single buffer or string. Emacs translates between the multibyte character encoding and various other coding systems when reading and writing files, and when exchanging data with subprocesses.
The command C-h h (view-hello-file) displays the file etc/HELLO, which illustrates various scripts by showing how to say “hello” in many languages. If some characters can’t be displayed on your terminal, they appear as ‘?’ or as hollow boxes (see Undisplayable Characters).
Keyboards, even in the countries where these character sets are used, generally don’t have keys for all the characters in them. You can insert characters that your keyboard does not support, using C-x 8 RET (insert-char). See Inserting Text. Shorthands are available for some common characters; for example, you can insert a left single quotation mark ‘ by typing C-x 8 [, or in Electric Quote mode, usually by simply typing `. See Quotation Marks. Emacs also supports various input methods, typically one for each script or language, which make it easier to type characters in the script. See Input Methods.
The prefix key C-x RET is used for commands that pertain to multibyte characters, coding systems, and input methods.
The command C-x = (what-cursor-position) shows information about the character at point. In addition to the character position, which was described in Cursor Position Information, this command displays how the character is encoded. For instance, it displays the following line in the echo area for the character ‘c’:
```
Char: c (99, #o143, #x63) point=28062 of 36168 (78%) column=53
```
The four values after ‘Char:’ describe the character that follows point, first by showing it and then by giving its character code in decimal, octal and hex. For a non-ASCII multibyte character, these are followed by ‘file’ and the character’s representation, in hex, in the buffer’s coding system, if that coding system encodes the character safely and with a single byte (see Coding Systems). If the character’s encoding is longer than one byte, Emacs shows ‘file ...’.
On rare occasions, Emacs encounters raw bytes: single bytes whose values are in the range 128 (0200 octal) through 255 (0377 octal), which Emacs cannot interpret as part of a known encoding of some non-ASCII character. Such raw bytes are treated as if they belonged to a special character set eight-bit; Emacs displays them as escaped octal codes (this can be customized; see Customization of Display). In this case, C-x = shows ‘raw-byte’ instead of ‘file’. In addition, C-x = shows the character codes of raw bytes as if they were in the range #x3FFF80..#x3FFFFF, which is where Emacs maps them to distinguish them from Unicode characters in the range #x0080..#x00FF.
With a prefix argument (C-u C-x =), this command additionally calls the command describe-char, which displays a detailed description of the character:
* *The character set name, and the codes that identify the character within that character set; ASCII characters are identified as belonging to the ascii character set.
* The character’s script, syntax and categories.
* What keys to type to input the character in the current input method (if it supports the character).
* The character’s encodings, both internally in the buffer, and externally if you were to save the buffer to a file.
* If you are running Emacs on a graphical display, the font name and glyph code for the character. If you are running Emacs on a text terminal, the code(s) sent to the terminal.
* If the character was composed on display with any following characters to form one or more grapheme clusters, the composition information: the font glyphs if the frame is on a graphical display, and the characters that were composed.
* The character’s text properties (see Text Properties in the Emacs Lisp Reference Manual), including any non-default faces used to display the character, and any overlays containing it (see Overlays in the same manual).
Here’s an example, with some lines folded to fit into this manual:
```
position: 1 of 1 (0%), column: 0
character: ê (displayed as ê) (codepoint 234, #o352, #xea)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xEA
script: latin
syntax: w which means: word
category: .:Base, L:Left-to-right (strong), c:Chinese,
j:Japanese, l:Latin, v:Viet
to input: type "C-x 8 RET ea" or
"C-x 8 RET LATIN SMALL LETTER E WITH CIRCUMFLEX"
buffer code: #xC3 #xAA
file code: #xC3 #xAA (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-PfEd-DejaVu Sans Mono-normal-normal-
normal-*-15-*-*-*-m-0-iso10646-1 (#xAC)
Character code properties: customize what to show
name: LATIN SMALL LETTER E WITH CIRCUMFLEX
old-name: LATIN SMALL LETTER E CIRCUMFLEX
general-category: Ll (Letter, Lowercase)
decomposition: (101 770) ('e' '^')
```
================================================
FILE: doc/editors/vi/index.md
================================================
# vim notes
vim comes with most modern Linux and BSD distributions.
## Digraph characters
To add digraphs for each USV character, add
```
digraph us 9247 rs 9246 gs 9245 fs 9244 es 9243 eo 9220
```
to your `~/.vimrc`
Then when you want to type, for instance, the record separator character, in insert mode, type `<ctrl-k>rs`
## List hidden characters
To list hidden characters:
```
:set list
```
Later:
```
:set nolist
```
================================================
FILE: doc/end-of-transmission/index.md
================================================
# End of Transmission (EOT)
The End of Transmission (EOT) mark tells any reader that it can stop reading.
* EOT tells the data reader that data is done.
* EOT has no effect on the output content.
Example of a unit "abc" then EOT then extra data "xxx" that is ignored.
```usv
abc␞␄xxx
```
EOT can be useful for a variety of use cases:
* Streaming data, such as to signal that the reader can close a connection.
* Appending data, such as USV content, then extra information such as comments.
* Attaching data, such as a USV spreadsheet that has MIME attachments.
================================================
FILE: doc/escape/index.md
================================================
# Escape (ESC)
The Escape (ESC) symbol makes the subsequent character treated as a content character.
Example: USV with a unit that contains an Escape + End of Transmission, which is treated as content.
```usv
a␛␄b␟
```
In the rare case that you need a separator then content that starts with a carriage return or newline:
* You escape the carriage return or newline.
* This is because separators may be optionally be followed by any number of carriage returns and/or newlines, which is to help with visual display.
================================================
FILE: doc/faq/index.md
================================================
# Frequently Asked Questions
USV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).
Constructive feedback is welcome. See also [criticisms](../criticisms/).
- [Is USV easy?](#is-usv-easy)
- [IS USV aiming to be a standard?](#is-usv-aiming-to-be-a-standard)
- [Why choose USV over CSV or TSV?](#why-choose-usv-over-csv-or-tsv)
- [Why choose USV over ASV?](#why-choose-usv-over-asv)
- [Why choose USV over ASV for machine-only data?](#why-choose-usv-over-asv-for-machine-only-data)
- [Why use control picture characters rather than the control characters themselves?](#why-use-control-picture-characters-rather-than-the-control-characters-themselves)
- [Why are the symbols so small on my screen?](#why-are-the-symbols-so-small-on-my-screen)
## Is USV easy?
Yes. If you know about comma separated values (CSV), or tab separated values
(TSV), or ASCII separated values (ASV), or JavaScript Object Notation (JSON),
then you already know much about USV.
## IS USV aiming to be a standard?
Yes, USV is aiming to become an IETF standard similar to <a
href="https://www.ietf.org/rfc/rfc4180.txt">IETF RCF 4180 for CSV</a>.
We have submitted the IETF Internet Draft and it is a work in progress.
Yes, USV is aiming to become an IANA standard similar to <a
href="https://www.iana.org/assignments/media-types/text/tab-separated-values">IANA
TSV</a>. We have submitted the request for the "text/usv" media type.
## Why choose USV over CSV or TSV?
You want your data content to be able to contain commas, or tabs, or newlines,
without special escaping or different quoting rules than other data such as
numbers.
You want your data content to be able to use data groups, or database tables, or
spreadsheet grids.
You want your data format to be able to use data files, or database schemas, or
spreadsheet folios.
You want your data semantics to be able to use hierarchy levels, nesting, or
outlines.
You want a consistent compatible standard format, which CSV can't always
provide.
You want a consistent compatible standardized file name extension, which
CSV/TSV/TDF can't always provide.
You want to use End of Transmission (EOT), so you can guarantee a reader
has read data until the end.
## Why choose USV over ASV?
You want your data content to be friendlier for human reading and human editing.
USV provides typically-visible letter-width characters (such as Unicode 241F),
whereas ASV provides typically-invisible zero-width characters (such as ASCII
31).
It's true that some editors do render ASV characters using other visual
representations, such as using the corresponding USV visible characters;
however in practice we haven't found much support for this approach.
## Why choose USV over ASV for machine-only data?
For machine-only data, such as data that will never be used for human reading or
human editing, then USV or ASV are similar because both can handle units,
fields, groups, and files.
## Why use control picture characters rather than the control characters themselves?
We tried using the control characters, and also tried configuring various editors to show the control characters by rendering the control picture characters.
First, we encountered many difficulties with editor configurations, attempting to make each editor treat the invisible zero-width characters by rendering with the visible letter-width characters.
Second, we encountered problems with copy/paste functionality, where it often didn't work because the editor implementations and terminal implementations copied visible letter-width characters, not the underlying invisible zero-width characters.
Third, users were unable to distinguish between the rendered control picture characters (e.g. the editor saw ASCII 31 and rendered Unicode Unit Separator) versus the control picture characters being in the data content (e.g. someone actually typed Unicode Unit Separator into the data content).
## Why are the symbols so small on my screen?
USV renders on your system by using your local font. If your local font has small Unicode symbols for specific characters, then you'll see these. On many systems we've tried, the characters render with the letters "US", "RS", "GS", "FS", etc. We are open to suggestions for fonts that work especially with with USV, and we are open to funding the creation of specialized fonts for these specific characters.
================================================
FILE: doc/history-of-ascii-separated-values/index.md
================================================
# History of ASCII separated values (ASV)
➤ <https://www.lammertbies.nl/comm/info/ascii-characters>
## ASCII 28 = FS = File separator
The file separator FS is an interesting control code, as it gives us insight in the way that computer technology was organized in the sixties. We are now used to random access media like RAM and magnetic disks, but when the ASCII standard was defined, most data was serial. I am not only talking about serial communications, but also about serial storage like punch cards, paper tape and magnetic tapes. In such a situation it is clearly efficient to have a single control code to signal the separation of two files. The FS was defined for this purpose.
## ASCII 29 = GS = Group separator
Data storage was one of the main reasons for some control codes to get in the ASCII definition. Databases are most of the time setup with tables, containing records. All records in one table have the same type, but records of different tables can be different. The group separator GS is defined to separate tables in a serial data storage system. Note that the word table wasn't used at that moment and the ASCII people called it a group.
## ASCII 30 = RS = Record separator
Within a group (or table) the records are separated with RS or record separator.
## ASCII 31 = US = Unit separator
The smallest data item to be stored in a database is called a unit in the ASCII definition. The unit separator separates these fields in a serial data storage environment. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times.
## ASCII 14 = Shift Out & ASCII 15 = Shift In
The original purpose of these characters was to provide a way to shift a coloured ribbon, split longitudinally usually with red and black, up and down to the other color in an electro-mechanical typewriter or teleprinter, such as the Teletype Model 38, to automate the same function of manual typewriters. Black was the conventional ambient default color and so was shifted "in" or "out" with the other color on the ribbon.
➤ <https://wikipedia.org/wiki/Shift_Out_and_Shift_In_characters>
================================================
FILE: doc/how-to-type-unicode-characters/index.md
================================================
# How to type Unicode characters
On many systems, you can type Unicode characters this way:
1. Press and hold the Alt key a.k.a. Option key.
2. Type + and the Unicode character hexadecimal code, such as +241f for Unit Separator.
3. Release the Alt key a.k.a. Option key.
On Apple macOS, you may need to do a one-time setup:
1. Go to System Preferences -> Keyboard -> Input Sources.
2. Click on + button, select "Others" -> "Unicode Hex Input" and press "Add". (End of one-time)
3. Switch to the Unicode Hex Input in the menu bar.
4. Hold down the Option key and type the hexadecimal unicode value, then release the Option key.
================================================
FILE: doc/how-to-use-split-and-regex/index.md
================================================
# How to use split and regex
To use split and regex, rather than a specific USV parsing tool or library, then you have choices.
The pseudocode here is the current best approximation of USV using split and regex.
If you are certain that your data never uses any escape characters:
```regex
transmission = split input on "[\u0004\u2404]" first
files = split transmission on "[\u001C\u241C]"
groups = split file on "[\u001D\u241D]
records = split group on "[\u001E\u241E]"
units = split unit on "[\u001F\u241F]"
unit = trim(unit)
```
If your data may use any escape characters, and also if your split and regex offer capabilities for negative lookbehind:
```regex
transmission = split input on "[\u0004\u2404]" first
files = split transmission on "(?<![\u001B\u241B])\u001C\u241C"
groups = split file on "(?<![\u001B\u241B])[\u001D\u241D]␝"
records = split group on "(?<![\u001B\u241B])[\u001E\u241E]"
units = split unit on "(?<![\u001B\u241B])[\u001F\u241F]"
unit = trim(unit)
```
================================================
FILE: doc/layout/index.md
================================================
# Layout
USV styles can customize various kinds of output so it looks like you prefer.
* Layout 0: Show each item with no line around it. This is no layout, in other words one long line.
* Layout 1: Show each item with one line around it. This is like single-space lines for long form text.
* Layout 2: Show each item with two lines around it. This is like double-space lines for long form text.
* Layout units: Show each unit on one line. This can be helpful for line-oriented tools.
* Layout records: Show each record on one line. This is like a typical spreadsheet sheet export.
* Layout groups: Show each group on one line. This can be helpful for folio-oriented tools.
* Layout files: Show one file on one line. This can be helpful for archive-oriented tools.
================================================
FILE: doc/markup/index.md
================================================
# USV markup
USV uses Unicode characters for data markup.
* <tt>[U+001F](https://codepoints.net/U+001F)/[U+241F](https://codepoints.net/U+241F)</tt> Unit Separator. For a spreadsheet cell, database field, etc.
* <tt>[U+001E](https://codepoints.net/U+001E)/[U+241E](https://codepoints.net/U+241E)</tt> Record Separator. For a spreadsheet line, database row, etc.
* <tt>[U+001D](https://codepoints.net/U+001D)/[U+241D](https://codepoints.net/U+241D)</tt> Group Separator. For a spreadsheet sheet, database table, etc.
* <tt>[U+001C](https://codepoints.net/U+001C)/[U+241C](https://codepoints.net/U+241C)</tt> File Separator. For a spreadsheet folio, database schema, etc.
* <tt>[U+001B](https://codepoints.net/U+001B)/[U+241B](https://codepoints.net/U+241B)</tt> Escape. For protecting markup characters in content.
* <tt>[U+0004](https://codepoints.net/U+0004)/[U+2404](https://codepoints.net/U+2404)</tt> End of Transmission. For concluding parsing.
## Character details
* [Escape (ESC)](../escape/)
* [End of Transmission (EOT)](../end-of-transmission/)
* [Spacers](../spacers/)
================================================
FILE: doc/purpose/index.md
================================================
# USV purpose
The USV purpose is to help people edit data, share data, and manage data.
* Edit data by using plain text and any typical text editor.
* Share data by using an international standard for markup.
* Manage data by in ways that work well with spreadsheets and databases.
## Edit data by using plain text and any typical text editor
USV is a plain text format that aims to be easy to read and edit.
* Because USV is plain text, you can use any text editor to open a USV file, edit it, save it, print it, and so on.
* Because USV enables line spacing wherever you want it, you can edit anything from simple unit-oriented data (such as for logs and metrics) all the way up to complex file-oriented data (such as for blog posts and content management).
* Because USV can display marks using your choice of visible symbol characters or invisible control characters, you can edit using your preferred editors and preferred settings for displaying Unicode symbols and Unicode controls.
## Share data by using an international standard for markup
USV has a formal specification on-track to become an international standard.
* Because USV is for worldwide sharing, there is a specification that sets the same marks (such as delimiters) for everyone.
* Because USV provides a formal IETF Internet-Draft, anyone may implement USV in any language, and know that it will work.
* Because USV has a reference implementation that is free libre open source software, everyone can share the tooling as well.
## Manage data by in ways that work well with spreadsheets and databases
USV can manage data collections such as spreadsheet sheets and folios, and database tables and schemas.
* Because USV has units, records, groups, files, and end of transmission, it has more dimensions than CSV, and can even allow for attachments.
* Because USV has more dimensions, it can replace ad hoc binders, such as ZIP files comprising CSV sheets, or XML files comprising Excel workbooks.
* Because USV has jagged array capabilities, it can help save and restore system disk paths, spreadsheet folio tabs, database table names, and more.
================================================
FILE: doc/rfc/draft-unicode-separated-values-01.txt
================================================
Internet Engineering Task Force J. Henderson, Ed.
Internet-Draft 16 March 2024
Intended status: Experimental
Expires: 17 September 2024
Unicode Separated Values (USV)
draft-unicode-separated-values-01
Abstract
Unicode Separated Values (USV) is a data format that uses Unicode
characters to mark parts. USV builds on ASCII separated values
(ASV), and provides pragmatic ways to edit data in text editors by
using visual symbols and layouts.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 17 September 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Henderson Expires 17 September 2024 [Page 1]
Internet-Draft Unicode Separated Values (USV) March 2024
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
1.2. Media Type Language . . . . . . . . . . . . . . . . . . . 3
1.3. ABNF Language . . . . . . . . . . . . . . . . . . . . . . 3
2. USV characters . . . . . . . . . . . . . . . . . . . . . . . 3
3. Definition of the USV Format . . . . . . . . . . . . . . . . 4
3.1. Data . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2. Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3. Record . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4. Group . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.5. File . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.6. Header . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.7. Escape (ESC) . . . . . . . . . . . . . . . . . . . . . . 5
3.8. End of Transmission (EOT) . . . . . . . . . . . . . . . . 5
4. ABNF grammar . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. Semantics . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3. Runs . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4. Character classes . . . . . . . . . . . . . . . . . . . . 6
4.5. Unicode symbols . . . . . . . . . . . . . . . . . . . . . 6
5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1. Hello World . . . . . . . . . . . . . . . . . . . . . . . 7
5.2. Hello World Goodnight Moon . . . . . . . . . . . . . . . 7
5.3. Units, Records, Groups, Files . . . . . . . . . . . . . . 8
5.4. Articles . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Source Code Examples . . . . . . . . . . . . . . . . . . . . 10
7. MIME media type registration for text/usv . . . . . . . . . . 11
7.1. Optional parameters: charset, header . . . . . . . . . . 11
7.2. Encoding considerations . . . . . . . . . . . . . . . . . 11
7.3. Security considerations . . . . . . . . . . . . . . . . . 12
7.4. Interoperability considerations . . . . . . . . . . . . . 12
7.5. Published specification . . . . . . . . . . . . . . . . . 12
7.6. Applications that use this media type . . . . . . . . . . 12
7.7. Additional information . . . . . . . . . . . . . . . . . 12
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
9. Security Considerations . . . . . . . . . . . . . . . . . . . 13
10. Converters . . . . . . . . . . . . . . . . . . . . . . . . . 13
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
11.1. Normative References . . . . . . . . . . . . . . . . . . 13
11.2. Informative References . . . . . . . . . . . . . . . . . 14
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 15
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15
Henderson Expires 17 September 2024 [Page 2]
Internet-Draft Unicode Separated Values (USV) March 2024
1. Introduction
Unicode Separated Values (USV) is a data format useful for exchanging
and converting data between various spreadsheet programs, databases,
and streaming data services. This RFC explains USV.
Additionally, we propose a new media type "text/usv", to be
registered with IANA.
We provide information references for a USV git repository
[usv-git-repository], a programming implementation as a USV Rust
crate [usv-rust-crate], and converter tools.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
1.2. Media Type Language
The media type normative references are RFC 6838 [RFC6838], RFC 2046
[RFC2046], and RFC 4289 [RFC4289].
1.3. ABNF Language
The ABNF normative reference is RFC 5234 [RFC5234].
2. USV characters
Separators:
* File Separator (FS) is U+001C or U+241C
* Group Separator (GS) is U+001D or U+241D
* Record Separator (RS) is U+001E or U+241E
* Unit Separator (US) is U+001F or U+241F
Modifiers:
* Escape (ESC) is U+001B or U+241B
* End of Transmission (EOT) is U+0004 or U+2404
Henderson Expires 17 September 2024 [Page 3]
Internet-Draft Unicode Separated Values (USV) March 2024
Spacers:
* Carriage Return (CR) is U+000D
* Line Feed (LF) is U+000A
3. Definition of the USV Format
3.1. Data
Data comprises units, records, groups, and files.
3.2. Unit
A unit comprises content characters. It runs until a Unit Separator
(US):
Example unit and unit separator:
<CODE BEGINS> file "unit-and-unit-separator.usv"
aaa␟
<CODE ENDS>
3.3. Record
A record comprises units. It runs until a Record Separator (RS):
Example record and record separator:
<CODE BEGINS> file "record-and-record-separator.usv"
aaa␟bbb␟␞
<CODE ENDS>
3.4. Group
A group comprises records. It runs until a Group Separator (GS):
Example group and group separator:
<CODE BEGINS> file "group-and-group-separator.usv"
aaa␟bbb␟␞ccc␟ddd␟␞␝
<CODE ENDS>
3.5. File
A file comprises groups. It runs until a file separator.
Example file and file separator:
Henderson Expires 17 September 2024 [Page 4]
Internet-Draft Unicode Separated Values (USV) March 2024
<CODE BEGINS> file "file-and-file-separator.usv"
aaa␟bbb␟␞ccc␟ddd␟␞␝eee␟fff␟␞ggg␟hhh␟␞␝␜
<CODE ENDS>
3.6. Header
There may be an optional header appearing as the first item and with
the same format as normal items. This header will contain names
corresponding to the fields in the data, and should contain the same
number of fields as the rest of data. The presence or absence of the
header line should be indicated via the optional "header" parameter
of this media type.
For example:
<CODE BEGINS> file "header.usv"
name␟name␟␞aaa␟bbb␟␞
<CODE ENDS>
3.7. Escape (ESC)
Escape (ESC) makes the next character content.
Example: USV with a unit that contains an Escape + End of
Transmission; because of the Escape, the End of Transmission is
treated as content:
<CODE BEGINS> file "header.usv"
a␛␄b␟
<CODE ENDS>
3.8. End of Transmission (EOT)
End of Transmission (EOT) tells any reader that it can stop reading.
This is can be useful for streaming data, such as to end a
connection. This can also be useful for providing data files that
contain USV data, then EOT, then addition non-USV information such as
comments, images, attachments, etc.
* EOT tells the data reader that it can stop.
* EOT has no effect on the output content.
Example of a unit then an End of Transmission:
<CODE BEGINS> file "header.usv"
abc␞␄ignorable
<CODE ENDS>
Henderson Expires 17 September 2024 [Page 5]
Internet-Draft Unicode Separated Values (USV) March 2024
4. ABNF grammar
4.1. Semantics
usv = *files
file = *groups
group = *records
record = *units
unit = *content-characters
4.2. Syntax
usv = ( header-and-body / body ) '*' ; anything after the body is
chaff
header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-
run
body = *unit-run / *record-run / *group-run / *file-run
4.3. Runs
file-run = *( *spacer-character file *spacer-character FS )
group-run = *( *spacer-character group *spacer-character GS )
record-run = *( *spacer-character record *spacer-character RS )
unit-run = *( *spacer-character unit *spacer-character US )
4.4. Character classes
content-character = typical-character / ESC '*'
typical-character = '*' - special-character
special-character = US / RS / GS / FS / ESC / EOT
spacer-character = CR / LF
4.5. Unicode symbols
FS = U+001C File Separator / U+241C Symbol for File Separator
Henderson Expires 17 September 2024 [Page 6]
Internet-Draft Unicode Separated Values (USV) March 2024
GS = U+001D Group Separator / U+241D Symbol for Group Separator
RS = U+001E Record Separator / U+241E Symbol for Record Separator
US = U+001F Unit Separator / U+241F Symbol for Unit Separator
ESC = U+001B Escape / U+241B Symbol for Escape
EOT = U+0004 End of Transmission / U+2404 Symbol for End of
Transmission
CR = U+000D Carriage Return
LF = U+000A Line Feed
5. Examples
5.1. Hello World
This kind of data ...
<CODE BEGINS> file "hello-world.txt"
hello, world
<CODE ENDS>
... is represented in USV as two units:
<CODE BEGINS> file "hello-world.usv"
hello␟world␟
<CODE ENDS>
If you prefer to see one unit per line, then you can add carriage
returns and/or newlines:
<CODE BEGINS> file "hello-world-with-lines.usv"
hello␟
world␟
<CODE ENDS>
5.2. Hello World Goodnight Moon
This kind of data ...
<CODE BEGINS> file "hello-world-goodnight-moon.txt"
[ hello, world ], [ goodnight, moon ]
<CODE ENDS>
... is represented in USV as two records, each with two units:
Henderson Expires 17 September 2024 [Page 7]
Internet-Draft Unicode Separated Values (USV) March 2024
<CODE BEGINS> file "hello-world-goodnight-moon.usv"
hello␟world␟␞goodnight␟moon␟␞
<CODE ENDS>
If you prefer to see one record per line, then you can add carriage
returns and/or newlines:
<CODE BEGINS> file "hello-world-goodnight-moon-with-lines.usv"
hello␟world␟␞
goodnight␟moon␟␞
<CODE ENDS>
5.3. Units, Records, Groups, Files
USV with 2 units by 2 records by 2 groups by 2 files:
<CODE BEGINS> file "units-records-groups-files.usv"
a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜
<CODE ENDS>
If you prefer to see one record per line, then you can add carriage
returns and/or newlines:
<CODE BEGINS> file "units-records-groups-files-with-lines.usv"
a␟b␟␞
c␟d␟␞
␝
e␟f␟␞
g␟h␟␞
␝
␜
i␟j␟␞
k␟l␟␞
␝
m␟n␟␞
o␟p␟␞
␝
␜
<CODE ENDS>
If you prefer to see one unit per line, then you can add carriage
returns and/or newlines:
Henderson Expires 17 September 2024 [Page 8]
Internet-Draft Unicode Separated Values (USV) March 2024
<CODE BEGINS> file "units-records-groups-files-with-lines.usv"
a␟
b␟
␞
c␟
d␟
␞
␝
e␟
f␟
␞
g␟
h␟
␞
␝
␜
i␟
j␟
␞
k␟
l␟
␞
␝
m␟
n␟
␞
o␟
p␟
␞
␝
␜
<CODE ENDS>
5.4. Articles
USV can format paragraphs, such as in this example data stream of
articles; note the units contain leading spacers and trailing spacers.
Henderson Expires 17 September 2024 [Page 9]
Internet-Draft Unicode Separated Values (USV) March 2024
<CODE BEGINS> file "articles.usv"
Title One
␟
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.
␟␞
Title Two
␟
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
␟␞
Title Three
␟
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium
doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae vitae dicta sunt explicabo.
␟␞
<CODE ENDS>
6. Source Code Examples
These source code examples demonstrate the Rust programming language
and the USV Rust crate.
Units:
<CODE BEGINS> file "usv-rust-crate-units.rs"
use usv::*;
let str = "a␟b␟";
let units: Units = str.units().collect();
<CODE ENDS>
Records:
<CODE BEGINS> file "usv-rust-crate-records.rs"
use usv::*;
let str = "a␟b␟␞c␟d␟␞";
let records: Records = str.records().collect();
<CODE ENDS>
Groups:
Henderson Expires 17 September 2024 [Page 10]
Internet-Draft Unicode Separated Values (USV) March 2024
<CODE BEGINS> file "usv-rust-crate-groups.rs"
use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝";
let groups: Groups = str.groups().collect();
<CODE ENDS>
Files:
<CODE BEGINS> file "usv-rust-crate-groups.rs"
use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜";
let files: Files = str.files().collect();
<CODE ENDS>
7. MIME media type registration for text/usv
This section provides the MIME media type registration application
information.
To: ietf-types@iana.org
Subject: Registration of MIME media type text/usv
MIME media type name: text
MIME subtype name: usv
Required parameters: none
7.1. Optional parameters: charset, header
Common usage of USV is UTF-8, but other character sets defined by
IANA for the "text" tree may be used in conjunction with the
"charset" parameter.
The "header" parameter indicates the presence or absence of the
header line. Valid values are "present" or "absent". Implementors
choosing not to use this parameter must make their own decisions as
to whether the header line is present or absent.
7.2. Encoding considerations
This media type uses LF to denote line breaks. However, implementors
should be aware that some implementations may not conform i.e. may
incorrectly use other values.
Henderson Expires 17 September 2024 [Page 11]
Internet-Draft Unicode Separated Values (USV) March 2024
7.3. Security considerations
USV files contain passive text data that should not pose any risks.
However, it is possible in theory that malicious binary data may be
included in order to exploit potential buffer overruns in the program
processing USV data. Additionally, private data may be shared via
this format (which of course applies to any text data).
7.4. Interoperability considerations
Implementors should "be conservative in what you do, be liberal in
what you accept from others" (RFC 793 [8]) when processing USV data.
Implementations deciding not to use the optional "header" parameter
must make their own decision as to whether the header is absent or
present.
7.5. Published specification
https://github.com/sixarm/usv
7.6. Applications that use this media type
Spreadsheet programs, such as with import/export. Database programs,
such as with loading/saving text. Data conversion utilities.
7.7. Additional information
Magic number(s): none
File extension(s): usv
Apple macOS File Type Code(s): TEXT
Intended usage: COMMON
Author/Change controller: IESG
Contact: Joel Parker Henderson <joel@joelparkerhenderson.com>
8. IANA Considerations
We are requesting IANA to create a standard MIME media type "text/
usv".
We have filed an IANA request for this, with same contact
information.
Henderson Expires 17 September 2024 [Page 12]
Internet-Draft Unicode Separated Values (USV) March 2024
9. Security Considerations
This document should not affect the security of the Internet.
10. Converters
We implement converters to/from USV and various popular data formats,
including ASCII Separated Values (ASV), Comma Separated Values (CSV),
JavaScript Object Notation (JSON), Microsoft Excel XML (XLSX).
* asv-to-usv[asv-to-usv-rust-crate], usv-to-
asv[usv-to-asv-rust-crate]
* csv-to-usv[csv-to-usv-rust-crate], usv-to-
csv[usv-to-csv-rust-crate]
* json-to-usv[json-to-usv-rust-crate], usv-to-
json[usv-to-json-rust-crate]
* xlsx-to-usv[xlsx-to-usv-rust-crate], usv-to-
xlsx[usv-to-xlsx-rust-crate]
The converters are provided for informational purposes. The
converters are not part of the specification.
11. References
11.1. Normative References
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<https://www.rfc-editor.org/info/rfc5234>.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13,
RFC 6838, DOI 10.17487/RFC6838, January 2013,
<https://www.rfc-editor.org/info/rfc6838>.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
DOI 10.17487/RFC2046, November 1996,
<https://www.rfc-editor.org/info/rfc2046>.
Henderson Expires 17 September 2024 [Page 13]
Internet-Draft Unicode Separated Values (USV) March 2024
[RFC4289] Freed, N. and J. Klensin, "Multipurpose Internet Mail
Extensions (MIME) Part Four: Registration Procedures",
BCP 13, RFC 4289, DOI 10.17487/RFC4289, December 2005,
<https://www.rfc-editor.org/info/rfc4289>.
11.2. Informative References
[usv-git-repository]
Henderson, J., "USV git repository at
https://github.com/sixarm/usv", 2022.
[usv-rust-crate]
Henderson, J., "USV rust crate at
https://crates.io/crates/usv", 2024.
[asv-to-usv-rust-crate]
Henderson, J., "ASV to USV rust crate at
https://crates.io/crates/asv-to-usv", 2024.
[usv-to-asv-rust-crate]
Henderson, J., "USV to ASV rust crate at
https://crates.io/crates/usv-to-asv", 2024.
[csv-to-usv-rust-crate]
Henderson, J., "CSV to USV rust crate at
https://crates.io/crates/csv-to-usv", 2024.
[usv-to-csv-rust-crate]
Henderson, J., "USV to CSV rust crate at
https://crates.io/crates/usv-to-csv", 2024.
[json-to-usv-rust-crate]
Henderson, J., "JSON to USV rust crate at
https://crates.io/crates/json-to-usv", 2024.
[usv-to-json-rust-crate]
Henderson, J., "USV to JSON rust crate at
https://crates.io/crates/usv-to-json", 2024.
[xlsx-to-usv-rust-crate]
Henderson, J., "XLSX to USV rust crate at
https://crates.io/crates/xlsx-to-usv", 2024.
[usv-to-xlsx-rust-crate]
Henderson, J., "USV to XLSX rust crate at
https://crates.io/crates/usv-to-xlsx", 2024.
Henderson Expires 17 September 2024 [Page 14]
Internet-Draft Unicode Separated Values (USV) March 2024
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Acknowledgements
The author would like to thank Y. Shafranovich, author of the CSV
RFC, which provided guidance for this USV RFC.
A special thank you goes to P.X.V.
Contributors
Thanks to all of the contributors.
Joel Parker Henderson
Email: joel@joelparkerhenderson.com
Author's Address
Joel Parker Henderson (editor)
601 Van Ness Ave #E3-359
San Francisco, CA 94102
United States of America
Phone: 1-415-317-2700
Email: joel@joelparkerhenderson.com
URI: https://linkedin.com/in/joelparkerhenderson
Henderson Expires 17 September 2024 [Page 15]
================================================
FILE: doc/rfc/draft-unicode-separated-values-01.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<!--
draft-unicode-separated-values-01
Based on draft-rfcxml-general-template-annotated-00
This template includes examples of most of the features of RFCXML with comments explaining
how to customise them, and examples of how to achieve specific formatting.
Documentation:
https://authors.ietf.org/en/templates-and-schemas
To parse this XML, such as to create a PDF:
https://author-tools.ietf.org/
RFCXML vocabulary:
https://authors.ietf.org/rfcxml-vocabulary
Output:
* URL: https://www.ietf.org/archive/id/draft-unicode-separated-values-01.txt
* Status: https://datatracker.ietf.org/doc/draft-unicode-separated-values/
* HTML: https://www.ietf.org/archive/id/draft-unicode-separated-values-01.html
* HTMLized: https://datatracker.ietf.org/doc/html/draft-unicode-separated-values
-->
<?xml-model href="rfc7991bis.rnc"?> <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->
<!DOCTYPE rfc [
<!ENTITY nbsp " ">
<!ENTITY zwsp "​">
<!ENTITY nbhy "‑">
<!ENTITY wj "⁠">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
Use of an external entity file is not recommended. -->
<rfc
xmlns:xi="http://www.w3.org/2001/XInclude"
category="exp"
docName="draft-unicode-separated-values-01"
ipr="trust200902"
obsoletes=""
updates=""
submissionType="IETF"
xml:lang="en"
version="3">
<!--
* docName should be the name of your draft
* category should be one of std, bcp, info, exp, historic
* ipr should be one of trust200902, noModificationTrust200902, noDerivativesTrust200902, pre5378Trust200902
* updates can be an RFC number as NNNN
* obsoletes can be an RFC number as NNNN
-->
<front>
<title abbrev="Unicode Separated Values (USV)">Unicode Separated Values (USV)</title> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#title-4 -->
<!-- The abbreviated title is required if the full title is longer than 39 characters -->
<seriesInfo name="Internet-Draft" value="unicode-separated-values"/> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#seriesinfo -->
<!-- Set value to the name of the draft -->
<author fullname="Joel Parker Henderson" initials="J" role="editor" surname="Henderson"> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#author -->
<!-- initials should not include an initial for the surname -->
<!-- role="editor" is optional -->
<!-- Can have more than one author -->
<!-- all of the following elements are optional -->
<address> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#address -->
<postal>
<!-- Reorder these if your country does things differently -->
<street>601 Van Ness Ave #E3-359</street>
<city>San Francisco</city>
<region>CA</region>
<code>94102</code>
<country>US</country>
<!-- Can use two letter country code -->
</postal>
<phone>1-415-317-2700</phone>
<email>joel@joelparkerhenderson.com</email>
<!-- Can have more than one <email> element -->
<uri>https://linkedin.com/in/joelparkerhenderson</uri>
</address>
</author>
<date year="2024" month="3" day="16"/> <!-- https://authors.ietf.org/en/rfcxml-vocabulary#date -->
<!-- On draft subbmission:
* If only the current year is specified, the current day and month will be used.
* If the month and year are both specified and are the current ones, the current day will
be used
* If the year is not the current one, it is necessary to specify at least a month and day="1" will be used.
-->
<area>General</area>
<workgroup>Internet Engineering Task Force</workgroup>
<!-- "Internet Engineering Task Force" is fine for individual submissions. If this element is
not present, the default is "Network Working Group", which is used by the RFC Editor as
a nod to the history of the RFC Series. -->
<keyword>usv</keyword>
<keyword>data</keyword>
<keyword>format</keyword>
<keyword>markup</keyword>
<!-- Multiple keywords are allowed. Keywords are incorporated into HTML output files for
use by search engines. -->
<abstract>
<t>
Unicode Separated Values (USV) is a data format that uses Unicode
characters to mark parts. USV builds on ASCII separated values (ASV),
and provides pragmatic ways to edit data in text editors by using visual
symbols and layouts.
</t>
</abstract>
</front>
<middle>
<section>
<!-- The default attributes for <section> are numbered="true" and toc="default" -->
<name>Introduction</name>
<t>
Unicode Separated Values (USV) is a data format useful for exchanging
and converting data between various spreadsheet programs, databases,
and streaming data services. This RFC explains USV.
</t>
<t>
Additionally, we propose a new media type "text/usv", to be registered
with IANA.
</t>
<t>
We provide information references for a USV git repository <xref
target="usv-git-repository"/>, a programming implementation as a USV
Rust crate <xref target="usv-rust-crate"/>, and converter tools.
</t>
<section anchor="requirements-language">
<name>Requirements Language</name>
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 <xref target="RFC2119"/>
<xref target="RFC8174"/> when, and only when, they appear in all
capitals, as shown here.
</t>
</section>
<section anchor="media-type-language">
<name>Media Type Language</name>
<t>
The media type normative references are RFC 6838 <xref
target="RFC6838"/>, RFC 2046 <xref target="RFC2046"/>, and RFC 4289
<xref target="RFC4289"/>.
</t>
</section>
<section anchor="abnf-language">
<name>ABNF Language</name>
<t>
The ABNF normative reference is RFC 5234 <xref target="RFC5234"/>.
</t>
</section>
</section>
<section>
<name>USV characters</name>
<t>Separators:</t>
<ul>
<li>File Separator (FS) is U+001C or U+241C</li>
<li>Group Separator (GS) is U+001D or U+241D</li>
<li>Record Separator (RS) is U+001E or U+241E</li>
<li>Unit Separator (US) is U+001F or U+241F</li>
</ul>
<t>Modifiers:</t>
<ul>
<li>Escape (ESC) is U+001B or U+241B</li>
<li>End of Transmission (EOT) is U+0004 or U+2404</li>
</ul>
</section>
<section>
<name>Definition of the USV Format</name>
<section>
<name>Data</name>
<t>
Data comprises units, records, groups, and files.
</t>
</section>
<section>
<name>Unit</name>
<t>
A unit comprises content characters.
It runs until a Unit Separator (US):
</t>
<t>
Example unit and unit separator:
</t>
<sourcecode name="unit-and-unit-separator.usv" type="usv" markers="true">
<![CDATA[
aaa␟
]]>
</sourcecode>
</section>
<section>
<name>Record</name>
<t>
A record comprises units.
It runs until a Record Separator (RS):
</t>
<t>
Example record and record separator:
</t>
<sourcecode name="record-and-record-separator.usv" type="usv" markers="true">
<![CDATA[
aaa␟bbb␟␞
]]>
</sourcecode>
</section>
<section>
<name>Group</name>
<t>
A group comprises records.
It runs until a Group Separator (GS):
</t>
<t>
Example group and group separator:
</t>
<sourcecode name="group-and-group-separator.usv" type="usv" markers="true">
<![CDATA[
aaa␟bbb␟␞ccc␟ddd␟␞␝
]]>
</sourcecode>
</section>
<section>
<name>File</name>
<t>
A file comprises groups.
It runs until a file separator.
</t>
<t>
Example file and file separator:
</t>
<sourcecode name="file-and-file-separator.usv" type="usv" markers="true">
<![CDATA[
aaa␟bbb␟␞ccc␟ddd␟␞␝eee␟fff␟␞ggg␟hhh␟␞␝␜
]]>
</sourcecode>
</section>
<section>
<name>Header</name>
<t>
There may be an optional header appearing as the first item and with
the same format as normal items. This header will contain names
corresponding to the fields in the data, and should contain the same
number of fields as the rest of data. The presence or absence of the
header line should be indicated via the optional "header" parameter
of this media type.
</t>
<t>
For example:
</t>
<sourcecode name="header.usv" type="usv" markers="true">
<![CDATA[
name␟name␟␞aaa␟bbb␟␞
]]>
</sourcecode>
</section>
<section>
<name>Escape (ESC)</name>
<t>
Escape (ESC) makes the next character content.
</t>
<t>
Example: USV with a unit that contains an Escape + End of
Transmission; because of the Escape, the End of Transmission is
treated as content:
</t>
<sourcecode name="header.usv" type="usv" markers="true">
<![CDATA[
a␛␄b␟
]]>
</sourcecode>
</section>
<section>
<name>End of Transmission (EOT)</name>
<t>
End of Transmission (EOT) tells any reader that it can stop reading.
This is can be useful for streaming data, such as to end a connection.
This can also be useful for providing data files that contain USV
data, then EOT, then addition non-USV information such as comments,
images, attachments, etc.
</t>
<ul>
<li>
EOT tells the data reader that it can stop.
</li>
<li>
EOT has no effect on the output content.
</li>
</ul>
<t>
Example of a unit then an End of Transmission:
</t>
<sourcecode name="header.usv" type="usv" markers="true">
<![CDATA[
abc␞␄ignorable
]]>
</sourcecode>
</section>
</section>
<section>
<name>ABNF grammar</name>
<section>
<name>Semantics</name>
<t>usv = *files</t>
<t>file = *groups</t>
<t>group = *records</t>
<t>record = *units</t>
<t>unit = *content-characters</t>
</section>
<section>
<name>Syntax</name>
<t>usv = ( header-and-body / body ) '*' ; anything after the body is chaff</t>
<t>header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-run</t>
<t>body = *unit-run / *record-run / *group-run / *file-run</t>
</section>
<section>
<name>Runs</name>
<t>file-run = *( *spacer-character file *spacer-character FS )</t>
<t>group-run = *( *spacer-character group *spacer-character GS )</t>
<t>record-run = *( *spacer-character record *spacer-character RS )</t>
<t>unit-run = *( *spacer-character unit *spacer-character US )</t>
</section>
<section>
<name>Character classes</name>
<t>content-character = typical-character / ESC '*'</t>
<t>typical-character = '*' - special-character</t>
<t>special-character = US / RS / GS / FS / ESC / EOT</t>
<t>spacer-character = Defined by Unicode Derived Core Property White_Space</t>
</section>
<section>
<name>Unicode symbols</name>
<t>FS = U+001C File Separator / U+241C Symbol for File Separator</t>
<t>GS = U+001D Group Separator / U+241D Symbol for Group Separator</t>
<t>RS = U+001E Record Separator / U+241E Symbol for Record Separator</t>
<t>US = U+001F Unit Separator / U+241F Symbol for Unit Separator</t>
<t>ESC = U+001B Escape / U+241B Symbol for Escape</t>
<t>EOT = U+0004 End of Transmission / U+2404 Symbol for End of Transmission</t>
</section>
</section>
<section>
<name>Examples</name>
<section>
<name>Hello World</name>
<t>
This kind of data ...
</t>
<sourcecode name="hello-world.txt" type="txt" markers="true">
<![CDATA[
hello, world
]]>
</sourcecode>
<t>
... is represented in USV as two units:
</t>
<sourcecode name="hello-world.usv" type="usv" markers="true">
<![CDATA[
hello␟world␟
]]>
</sourcecode>
<t>
If you prefer to see one unit per line, then you can add whitespace,
such as newlines:
</t>
<sourcecode name="hello-world-with-lines.usv" type="usv" markers="true">
<![CDATA[
hello␟
world␟
]]>
</sourcecode>
</section>
<section>
<name>Hello World Goodnight Moon</name>
<t>
This kind of data ...
</t>
<sourcecode name="hello-world-goodnight-moon.txt" type="txt" markers="true">
<![CDATA[
[ hello, world ], [ goodnight, moon ]
]]>
</sourcecode>
<t>
... is represented in USV as two records, each with two units:
</t>
<sourcecode name="hello-world-goodnight-moon.usv" type="usv" markers="true">
<![CDATA[
hello␟world␟␞goodnight␟moon␟␞
]]>
</sourcecode>
<t>
If you prefer to see one record per line, then you can add whitespace,
such as newlines:
</t>
<sourcecode name="hello-world-goodnight-moon-with-lines.usv" type="usv" markers="true">
<![CDATA[
hello␟world␟␞
goodnight␟moon␟␞
]]>
</sourcecode>
</section>
<section>
<name>Units, Records, Groups, Files</name>
<t>
USV with 2 units by 2 records by 2 groups by 2 files:
</t>
<sourcecode name="units-records-groups-files.usv" type="usv" markers="true">
<![CDATA[
a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜
]]>
</sourcecode>
<t>
If you prefer to see one record per line, then you can add whitespace,
such as newlines:
</t>
<sourcecode name="units-records-groups-files-with-lines.usv" type="usv" markers="true">
<![CDATA[
a␟b␟␞
c␟d␟␞
␝
e␟f␟␞
g␟h␟␞
␝
␜
i␟j␟␞
k␟l␟␞
␝
m␟n␟␞
o␟p␟␞
␝
␜
]]>
</sourcecode>
<t>
If you prefer to see one unit per line, then you can add whitespace,
such as newlines:
</t>
<sourcecode name="units-records-groups-files-with-lines.usv" type="usv" markers="true">
<![CDATA[
a␟
b␟
␞
c␟
d␟
␞
␝
e␟
f␟
␞
g␟
h␟
␞
␝
␜
i␟
j␟
␞
k␟
l␟
␞
␝
m␟
n␟
␞
o␟
p␟
␞
␝
␜
]]>
</sourcecode>
</section>
<section>
<name>Articles</name>
<t>
USV can format paragraphs, such as in this example data stream of
articles; note the units contain leading spacers and trailing spacers.
</t>
<sourcecode name="articles.usv" type="usv" markers="true">
<![CDATA[
Title One
␟
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.
␟␞
Title Two
␟
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
␟␞
Title Three
␟
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium
doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae vitae dicta sunt explicabo.
␟␞
]]>
</sourcecode>
</section>
</section>
<section>
<name>Source Code Examples</name>
<t>These source code examples demonstrate the Rust programming language and the USV Rust crate.</t>
<t>Units:</t>
<sourcecode name="usv-rust-crate-units.rs" type="rust" markers="true">
<![CDATA[
use usv::*;
let str = "a␟b␟";
let units: Units = str.units().collect();
]]>
</sourcecode>
<t>Records:</t>
<sourcecode name="usv-rust-crate-records.rs" type="rust" markers="true">
<![CDATA[
use usv::*;
let str = "a␟b␟␞c␟d␟␞";
let records: Records = str.records().collect();
]]>
</sourcecode>
<t>Groups:</t>
<sourcecode name="usv-rust-crate-groups.rs" type="rust" markers="true">
<![CDATA[
use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝";
let groups: Groups = str.groups().collect();
]]>
</sourcecode>
<t>Files:</t>
<sourcecode name="usv-rust-crate-groups.rs" type="rust" markers="true">
<![CDATA[
use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜";
let files: Files = str.files().collect();
]]>
</sourcecode>
</section>
<section>
<!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
<name>MIME media type registration for text/usv</name>
<t>
This section provides the MIME media type registration application information.
</t>
<t>
To: ietf-types@iana.org
</t>
<t>
Subject: Registration of MIME media type text/usv
</t>
<t>
MIME media type name: text
</t>
<t>
MIME subtype name: usv
</t>
<t>
Required parameters: none
</t>
<section>
<name>Optional parameters: charset, header</name>
<t>
Common usage of USV is UTF-8, but other character sets defined by IANA
for the "text" tree may be used in conjunction with the "charset"
parameter.
</t>
<t>
The "header" parameter indicates the presence or absence of the
header line. Valid values are "present" or "absent".
Implementors choosing not to use this parameter must make their
own decisions as to whether the header line is present or absent.
</t>
</section>
<section>
<name>Encoding considerations</name>
<t>
This media type uses LF to denote line breaks. However, implementors
should be aware that some implementations may not conform i.e. may
incorrectly use other values.
</t>
</section>
<section>
<name>Security considerations</name>
<t>
USV files contain passive text data that should not pose any
risks. However, it is possible in theory that malicious binary
data may be included in order to exploit potential buffer overruns
in the program processing USV data. Additionally, private data
may be shared via this format (which of course applies to any text
data).
</t>
</section>
<section>
<name>Interoperability considerations</name>
<t>
Implementors should "be conservative in what you do, be liberal in
what you accept from others" (RFC 793 [8]) when processing USV data.
</t>
<t>
Implementations deciding not to use the optional "header"
parameter must make their own decision as to whether the header is
absent or present.
</t>
</section>
<section>
<name>Published specification</name>
<t>
https://github.com/sixarm/usv
</t>
</section>
<section>
<name>Applications that use this media type</name>
<t>
Spreadsheet programs, such as with import/export.
Database programs, such as with loading/saving text.
Data conversion utilities.
</t>
</section>
<section>
<name>Additional information</name>
<t>
Magic number(s): none
</t>
<t>
File extension(s): usv
</t>
<t>
Apple macOS File Type Code(s): TEXT
</t>
<t>
Intended usage: COMMON
</t>
<t>
Author/Change controller: IESG
</t>
<t>Contact: Joel Parker Henderson <joel@joelparkerhenderson.com>
</t>
</section>
</section>
<section anchor="IANA">
<!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
<name>IANA Considerations</name>
<t>We are requesting IANA to create a standard MIME media type "text/usv".</t>
<t>We have filed an IANA request for this, with same contact information.</t>
</section>
<section anchor="Security">
<!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
<name>Security Considerations</name>
<t>This document should not affect the security of the Internet.</t>
</section>
<!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->
<section anchor="Converters">
<!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
<name>Converters</name>
<t>
We implement converters to/from USV and various popular data formats,
including ASCII Separated Values (ASV), Comma Separated Values (CSV),
JavaScript Object Notation (JSON), Microsoft Excel XML (XLSX).
</t>
<ul>
<li>asv-to-usv<xref target="asv-to-usv-rust-crate"/>, usv-to-asv<xref target="usv-to-asv-rust-crate"/></li>
<li>csv-to-usv<xref target="csv-to-usv-rust-crate"/>, usv-to-csv<xref target="usv-to-csv-rust-crate"/></li>
<li>json-to-usv<xref target="json-to-usv-rust-crate"/>, usv-to-json<xref target="usv-to-json-rust-crate"/></li>
<li>xlsx-to-usv<xref target="xlsx-to-usv-rust-crate"/>, usv-to-xlsx<xref target="usv-to-xlsx-rust-crate"/></li>
</ul>
<t>
The converters are provided for informational purposes. The converters
are not part of the specification.
</t>
</section>
</middle>
<back>
<references>
<name>References</name>
<references>
<name>Normative References</name>
<!-- "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words" -->
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<!-- "Augmented BNF for Syntax Specifications: ABNF"-->
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml"/>
<!-- "Media Type Specifications and Registration Procedures" -->
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6838.xml"/>
<!-- ""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types" -->
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2046.xml"/>
<!-- "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures" -->
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4289.xml"/>
</references>
<references>
<name>Informative References</name>
<reference anchor="usv-git-repository">
<!-- Example minimum reference -->
<front>
<title>USV git repository at https://github.com/sixarm/usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2022"/>
</front>
</reference>
<reference anchor="usv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>USV rust crate at https://crates.io/crates/usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="asv-to-usv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>ASV to USV rust crate at https://crates.io/crates/asv-to-usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="usv-to-asv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>USV to ASV rust crate at https://crates.io/crates/usv-to-asv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="csv-to-usv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>CSV to USV rust crate at https://crates.io/crates/csv-to-usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="usv-to-csv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>USV to CSV rust crate at https://crates.io/crates/usv-to-csv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="json-to-usv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>JSON to USV rust crate at https://crates.io/crates/json-to-usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="usv-to-json-rust-crate">
<!-- Example minimum reference -->
<front>
<title>USV to JSON rust crate at https://crates.io/crates/usv-to-json</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="xlsx-to-usv-rust-crate">
<!-- Example minimum reference -->
<front>
<title>XLSX to USV rust crate at https://crates.io/crates/xlsx-to-usv</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="usv-to-xlsx-rust-crate">
<!-- Example minimum reference -->
<front>
<title>USV to XLSX rust crate at https://crates.io/crates/usv-to-xlsx</title>
<author initials="J" surname="Henderson">
<organization/>
</author>
<date year="2024"/>
</front>
</reference>
<reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
<!-- Manually added reference -->
<front>
<title>Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials="S." surname="Bradner" fullname="S. Bradner">
<organization/>
</author>
<date year="1997" month="March"/>
<abstract>
<t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.
</t>
</abstract>
</front>
<seriesInfo name="BCP" value="14"/>
<seriesInfo name="RFC" value="2119"/>
<seriesInfo name="DOI" value="10.17487/RFC2119"/>
</reference>
</references>
</references>
<section anchor="Acknowledgements" numbered="false">
<!-- an Acknowledgements section is optional -->
<name>Acknowledgements</name>
<t>
The author would like to thank Y. Shafranovich, author of the CSV RFC,
which provided guidance for this USV RFC.
</t>
<t>
A special thank you goes to P.X.V.
</t>
</section>
<section anchor="Contributors" numbered="false">
<!-- a Contributors section is optional -->
<name>Contributors</name>
<t>Thanks to all of the contributors.</t>
<contact fullname="Joel Parker Henderson" initials="J" surname="Henderson"><!-- https://authors.ietf.org/en/rfcxml-vocabulary#contact-->
<!-- including contact information for contributors is optional -->
<address>
<email>joel@joelparkerhenderson.com</email>
</address>
</contact>
</section>
</back>
</rfc>
================================================
FILE: doc/rfc/index.md
================================================
# Request For Comments (RFC)
USV is aiming to be an international standard with the IETF and IANA.
Work in progress:
* [https://datatracker.ietf.org/doc/draft-unicode-separated-values/01/](https://datatracker.ietf.org/doc/draft-unicode-separated-values/01/)
Files:
* [draft-unicode-separated-values-01.xml](draft-unicode-separated-values-01.xml) - this is the official IETF RFCXML.
* [draft-unicode-separated-values-01.pdf](draft-unicode-separated-values-01.pdf) - autogenerated from IETF RFCXML.
* [draft-unicode-separated-values-01.txt](draft-unicode-separated-values-01.txt) - autogenerated from IETF RFCXML.
================================================
FILE: doc/spacers/index.md
================================================
# Spacers
Spacers are characters that have the Unicode Derived Core Property White_Space.
Examples:
* U+0020 Space (SP)
* U+0009 Tab (TAB) aka Horizontal Tab (HT)
* U+000A Line Feed (LF) aka New Line (NL) aka End Of Line (EOL)
* U+000D Carriage Return (CR)
USV supports spacers around content and markers, because this greatly helps typical display uses.
## Line Feed character
USV with no spacers looks like this:
```usv
a␟b␟␞c␟d␟␞
```
If you want to see each record on its own line, then you can use newline characters:
```usv
a␟b␟␞
c␟d␟␞
```
If you want to see each unit on its own line, then you can use newline characters:
```usv
a␟
b␟
␞
c␟
d␟
␞
```
If you want to see each token on its own line, then you can use newline characters:
```usv
a
␟
b
␟
␞
c
␟
d
␟
␞
```
## Space character
USV with no spacers looks like this:
```usv
a␟bbb␟ccccc␟
```
If you want to see a column with left alignment, then you can use newline characters and space characters:
```usv
a ␟
bbb ␟
ccccc␟
```
If you want to see a column with right alignment, then you can use newline characters and space characters:
```usv
a␟
bbb␟
ccccc␟
```
If you want to see a column with center alignment, then you can use newline characters and space characters:
```usv
a ␟
bbb ␟
ccccc␟
```
================================================
FILE: doc/styles/index.md
================================================
# Styles
USV styles can customize various kinds of output so it looks like you prefer.
* Symbols: characters are visible symbols, such as "␟" for Unit Separator.
* Controls: characters are invisible controls, such as "\u001F" for Unit Separator.
* Braces: instead of characters, use pretty-print braces, such as "{US}" for Unit Separator.
================================================
FILE: doc/todo/index.md
================================================
# TODO list
We welcome help with this todo list.
## Add formats
Add USV formats to productivity applications:
* [ ] LibreOffice Calc
* [ ] Microsoft Excel
* [ ] Google Sheets
* Etc.
## Create libraries
Create USV libraries for programming languages:
* [x] Rust crate
* [ ] Python pip package
* [ ] Node npm package
* [ ] Ruby gem
* Etc.
## Add handling
Add USV handling to statistics systems:
* [ ] R
* [ ] Julia
* [ ] MatLab
* [ ] Mathematica
* [ ] Python fasspec
* [ ] Python Pandas
* [ ] Python Polars
* [ ] Python Dask
* Etc.
## Extend CLI tools
Extend USV capabilities for command line interface tools:
* [ ] Miller <https://github.com/johnkerl/miller/issues/245>
* [ ] TextQL <https://github.com/dinedal/textql/issues/115>
* [ ] Q <https://github.com/harelba/q/issues/201>
* [ ] jq
* [ ] xsv by BurntSushi
* Etc.
## Add comparisons
Add comparisons to other data formats:
* [ ] [Why isn’t there a decent file format for tabular data?](https://news.ycombinator.com/item?id=31220841)
* [ ] [Whitespace Separated Values (WSV)](https://dev.stenway.com/WSV/)
* [ ] [SimpleML](https://dev.stenway.com/SML/SimpleML.html)
* [ ] [KYLI](https://shkspr.mobi/blog/2017/03/kyli-because-it-is-superior-to-json/)
* [ ] [Rows of String Values (RSV)](https://github.com/Stenway/RSV-Specification)
## Improve converters
Improve converters: csv-to-usv and usv-to-csv
* [ ] Add support for CSV delimiters, especially semi-colon instead of comma.
* [ ] Add CLAP option for USV output with RS+ESC+LF.
================================================
FILE: examples/blog-posts.csv
================================================
"Title One","Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
"Title Two","Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
"Title Three","Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo."
================================================
FILE: examples/blog-posts.usv
================================================
Title One
␟
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
␟␞
Title Two
␟
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
␟␞
Title Three
␟
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium
doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae vitae dicta sunt explicabo.
␟␞
================================================
FILE: examples/end-of-transmission.usv
================================================
a␟b␟c␟␄
The End of Transmission (EOT) stops parsing.
For example, this text comes after the EOT character.
================================================
FILE: examples/hello-goodnight.csv
================================================
"I say ""hello, world"""
"You say ""goodnight, moon"""
================================================
FILE: examples/hello-goodnight.usv
================================================
I say "hello, world"␟␞
You say "goodnight, moon"␟␞
================================================
FILE: examples/stream.usv
================================================
a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜
================================================
FILE: examples/zen-koans.csv
================================================
"Truth Koan","A monk asked, ""Without words or silence, will you tell me the truth?"""
"Lotus Koan","A child asked, ""Before the lotus blossom emerges, what is it?"""
"World Koan","A student asked, ""How does an enlightened one return to the world?"""
================================================
FILE: examples/zen-koans.usv
================================================
Truth Koan␟A monk asked, "Without words or silence, will you tell me the truth?"␟␞
Lotus Koan␟A child asked, "Before the lotus blossom emerges, what is it?"␟␞
World Koan␟A student asked, "How does an enlightened one return to the world?"␟␞
================================================
FILE: tests/1-dimensional-as-line/expect.json
================================================
["a","b"]
================================================
FILE: tests/1-dimensional-as-line/input.usv
================================================
a␟b␟
================================================
FILE: tests/1-dimensional-as-lines/expect.json
================================================
["a","b"]
================================================
FILE: tests/1-dimensional-as-lines/input.usv
================================================
a
␟
b
␟
================================================
FILE: tests/2-dimensional-as-line/expect.json
================================================
[["a","b"],["c","d"]]
================================================
FILE: tests/2-dimensional-as-line/input.usv
================================================
a␟b␟␞c␟d␟␞
================================================
FILE: tests/2-dimensional-as-lines/expect.json
================================================
[["a","b"],["c","d"]]
================================================
FILE: tests/2-dimensional-as-lines/input.usv
================================================
a
␟
b
␟
␞
c
␟
d
␟
␞
================================================
FILE: tests/3-dimensional-as-line/expect.json
================================================
[[["a","b"],["c","d"]],[["e","f"],["g","h"]]]
================================================
FILE: tests/3-dimensional-as-line/input.usv
================================================
a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝
================================================
FILE: tests/3-dimensional-as-lines/expect.json
================================================
[[["a","b"],["c","d"]],[["e","f"],["g","h"]]]
================================================
FILE: tests/3-dimensional-as-lines/input.usv
================================================
a
␟
b
␟
␞
c
␟
d
␟
␞
␝
e
␟
f
␟
␞
g
␟
h
␟
␞
␝
================================================
FILE: tests/4-dimensional-as-line/expect.json
================================================
[[[["a","b"],["c","d"]],[["e","f"],["g","h"]]],[[["i","j"],["k","l"]],[["m","n"],["o","p"]]]]
================================================
FILE: tests/4-dimensional-as-line/input.usv
================================================
a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜
================================================
FILE: tests/4-dimensional-as-lines/expect.json
================================================
[[[["a","b"],["c","d"]],[["e","f"],["g","h"]]],[[["i","j"],["k","l"]],[["m","n"],["o","p"]]]]
================================================
FILE: tests/4-dimensional-as-lines/input.usv
================================================
a
␟
b
␟
␞
c
␟
d
␟
␞
␝
e
␟
f
␟
␞
g
␟
h
␟
␞
␝
␜
i
␟
j
␟
␞
k
␟
l
␟
␞
␝
m
␟
n
␟
␞
o
␟
p
␟
␞
␝
␜
================================================
FILE: tests/blog-posts/output-actual.txt
================================================
Title One
unit separator
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
record separator
Title Two
unit separator
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
record separator
Title Three
unit separator
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium
doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae vitae dicta sunt explicabo.
================================================
FILE: tests/blog-posts/output-expect.txt
================================================
Title One
unit separator
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
record separator
Title Two
unit separator
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
record separator
Title Three
unit separator
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium
doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae vitae dicta sunt explicabo.
================================================
FILE: tests/blog-posts/test.sh
================================================
#!/bin/sh
set -euf
top="$(git rev-parse --show-toplevel)"
cat "$top/examples/blog-posts.usv" | "$top/bin/usv-to-debug.bash" > output-actual.txt
diff output-actual.txt output-expect.txt
================================================
FILE: tests/end-of-transmission-block/output-actual.txt
================================================
a
unit separator
b
unit separator
c
End of Transmission
================================================
FILE: tests/end-of-transmission-block/output-expect.txt
================================================
a
unit separator
b
unit separator
c
End of Transmission
================================================
FILE: tests/end-of-transmission-block/test.sh
================================================
#!/bin/sh
set -euf
top="$(git rev-parse --show-toplevel)"
cat "$top/examples/end-of-transmission.usv" | "$top/bin/usv-to-debug.bash" > output-actual.txt
diff output-actual.txt output-expect.txt
================================================
FILE: tests/microsoft-excel/example1.xls
================================================
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?><Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office"><Colors><Color><Index>3</Index><RGB>#000000</RGB></Color><Color><Index>4</Index><RGB>#0000ee</RGB></Color><Color><Index>5</Index><RGB>#006600</RGB></Color><Color><Index>6</Index><RGB>#333333</RGB></Color><Color><Index>7</Index><RGB>#808080</RGB></Color><Color><Index>8</Index><RGB>#996600</RGB></Color><Color><Index>9</Index><RGB>#c0c0c0</RGB></Color><Color><Index>10</Index><RGB>#cc0000</RGB></Color><Color><Index>11</Index><RGB>#ccffcc</RGB></Color><Color><Index>12</Index><RGB>#dddddd</RGB></Color><Color><Index>13</Index><RGB>#ffcccc</RGB></Color><Color><Index>14</Index><RGB>#ffffcc</RGB></Color><Color><Index>15</Index><RGB>#ffffff</RGB></Color></Colors></OfficeDocumentSettings><ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"><WindowHeight>9000</WindowHeight><WindowWidth>13860</WindowWidth><WindowTopX>240</WindowTopX><WindowTopY>75</WindowTopY><ProtectStructure>False</ProtectStructure><ProtectWindows>False</ProtectWindows></ExcelWorkbook><Styles><Style ss:ID="Default" ss:Name="Default"/><Style ss:ID="Note" ss:Name="Note"><Font ss:FontName="Liberation Sans" ss:Size="10"/></Style><Style ss:ID="Default" ss:Name="Default"/><Style ss:ID="Heading" ss:Name="Heading"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="24"/></Style><Style ss:ID="Heading_20_1" ss:Name="Heading 1"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="18"/></Style><Style ss:ID="Heading_20_2" ss:Name="Heading 2"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="12"/></Style><Style ss:ID="Text" ss:Name="Text"><Alignment/></Style><Style ss:ID="Note" ss:Name="Note"><Alignment/><Borders><Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/></Borders><Interior ss:Color="#ffffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Footnote" ss:Name="Footnote"><Alignment/></Style><Style ss:ID="Hyperlink" ss:Name="Hyperlink"><Alignment/></Style><Style ss:ID="Status" ss:Name="Status"><Alignment/></Style><Style ss:ID="Good" ss:Name="Good"><Alignment/><Interior ss:Color="#ccffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Neutral" ss:Name="Neutral"><Alignment/><Interior ss:Color="#ffffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Bad" ss:Name="Bad"><Alignment/><Interior ss:Color="#ffcccc" ss:Pattern="Solid"/></Style><Style ss:ID="Warning" ss:Name="Warning"><Alignment/></Style><Style ss:ID="Error" ss:Name="Error"><Alignment/><Interior ss:Color="#cc0000" ss:Pattern="Solid"/></Style><Style ss:ID="Accent" ss:Name="Accent"><Alignment/></Style><Style ss:ID="Accent_20_1" ss:Name="Accent 1"><Alignment/><Font ss:Bold="1" ss:Color="#ffffff"/><Interior ss:Color="#000000" ss:Pattern="Solid"/></Style><Style ss:ID="Accent_20_2" ss:Name="Accent 2"><Alignment/><Font ss:Bold="1" ss:Color="#ffffff"/><Interior ss:Color="#808080" ss:Pattern="Solid"/></Style><Style ss:ID="Accent_20_3" ss:Name="Accent 3"><Alignment/><Interior ss:Color="#dddddd" ss:Pattern="Solid"/></Style><Style ss:ID="Result" ss:Name="Result"><Alignment/><Font ss:Bold="1" ss:Italic="1" ss:Underline="Single"/></Style><Style ss:ID="co1"/><Style ss:ID="ta1"/></Styles><ss:Worksheet ss:Name="Sheet1"><Table ss:StyleID="ta1"><Column ss:Span="1" ss:Width="64.008"/><Row ss:Height="12.816"><Cell><Data ss:Type="String">a</Data></Cell><Cell><Data ss:Type="String">b</Data></Cell></Row><Row ss:Height="12.816"><Cell><Data ss:Type="String">c</Data></Cell><Cell><Data ss:Type="String">d</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet><ss:Worksheet ss:Name="Sheet2"><Table ss:StyleID="ta1"><Column ss:Span="1" ss:Width="64.008"/><Row ss:Height="12.816"><Cell><Data ss:Type="String">e</Data></Cell><Cell><Data ss:Type="String">f</Data></Cell></Row><Row ss:Height="12.816"><Cell><Data ss:Type="String">g</Data></Cell><Cell><Data ss:Type="String">h</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet></Workbook>
================================================
FILE: tests/microsoft-excel/example2.xls
================================================
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?><Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office"><Colors><Color><Index>3</Index><RGB>#000000</RGB></Color><Color><Index>4</Index><RGB>#0000ee</RGB></Color><Color><Index>5</Index><RGB>#006600</RGB></Color><Color><Index>6</Index><RGB>#333333</RGB></Color><Color><Index>7</Index><RGB>#808080</RGB></Color><Color><Index>8</Index><RGB>#996600</RGB></Color><Color><Index>9</Index><RGB>#c0c0c0</RGB></Color><Color><Index>10</Index><RGB>#cc0000</RGB></Color><Color><Index>11</Index><RGB>#ccffcc</RGB></Color><Color><Index>12</Index><RGB>#dddddd</RGB></Color><Color><Index>13</Index><RGB>#ffcccc</RGB></Color><Color><Index>14</Index><RGB>#ffffcc</RGB></Color><Color><Index>15</Index><RGB>#ffffff</RGB></Color></Colors></OfficeDocumentSettings><ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"><WindowHeight>9000</WindowHeight><WindowWidth>13860</WindowWidth><WindowTopX>240</WindowTopX><WindowTopY>75</WindowTopY><ProtectStructure>False</ProtectStructure><ProtectWindows>False</ProtectWindows></ExcelWorkbook><Styles><Style ss:ID="Default" ss:Name="Default"/><Style ss:ID="Note" ss:Name="Note"><Font ss:FontName="Liberation Sans" ss:Size="10"/></Style><Style ss:ID="Default" ss:Name="Default"/><Style ss:ID="Heading" ss:Name="Heading"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="24"/></Style><Style ss:ID="Heading_20_1" ss:Name="Heading 1"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="18"/></Style><Style ss:ID="Heading_20_2" ss:Name="Heading 2"><Alignment/><Font ss:Bold="1" ss:Color="#000000" ss:Size="12"/></Style><Style ss:ID="Text" ss:Name="Text"><Alignment/></Style><Style ss:ID="Note" ss:Name="Note"><Alignment/><Borders><Border ss:Position="Bottom" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Left" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Right" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/><Border ss:Position="Top" ss:LineStyle="Continuous" ss:Weight="1" ss:Color="#808080"/></Borders><Interior ss:Color="#ffffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Footnote" ss:Name="Footnote"><Alignment/></Style><Style ss:ID="Hyperlink" ss:Name="Hyperlink"><Alignment/></Style><Style ss:ID="Status" ss:Name="Status"><Alignment/></Style><Style ss:ID="Good" ss:Name="Good"><Alignment/><Interior ss:Color="#ccffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Neutral" ss:Name="Neutral"><Alignment/><Interior ss:Color="#ffffcc" ss:Pattern="Solid"/></Style><Style ss:ID="Bad" ss:Name="Bad"><Alignment/><Interior ss:Color="#ffcccc" ss:Pattern="Solid"/></Style><Style ss:ID="Warning" ss:Name="Warning"><Alignment/></Style><Style ss:ID="Error" ss:Name="Error"><Alignment/><Interior ss:Color="#cc0000" ss:Pattern="Solid"/></Style><Style ss:ID="Accent" ss:Name="Accent"><Alignment/></Style><Style ss:ID="Accent_20_1" ss:Name="Accent 1"><Alignment/><Font ss:Bold="1" ss:Color="#ffffff"/><Interior ss:Color="#000000" ss:Pattern="Solid"/></Style><Style ss:ID="Accent_20_2" ss:Name="Accent 2"><Alignment/><Font ss:Bold="1" ss:Color="#ffffff"/><Interior ss:Color="#808080" ss:Pattern="Solid"/></Style><Style ss:ID="Accent_20_3" ss:Name="Accent 3"><Alignment/><Interior ss:Color="#dddddd" ss:Pattern="Solid"/></Style><Style ss:ID="Result" ss:Name="Result"><Alignment/><Font ss:Bold="1" ss:Italic="1" ss:Underline="Single"/></Style><Style ss:ID="co1"/><Style ss:ID="ta1"/></Styles><ss:Worksheet ss:Name="Sheet1"><Table ss:StyleID="ta1"><Column ss:Span="1" ss:Width="64.008"/><Row ss:Height="12.816"><Cell><Data ss:Type="String">I</Data></Cell><Cell><Data ss:Type="String">j</Data></Cell></Row><Row ss:Height="12.816"><Cell><Data ss:Type="String">k</Data></Cell><Cell><Data ss:Type="String">l</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet><ss:Worksheet ss:Name="Sheet2"><Table ss:StyleID="ta1"><Column ss:Span="1" ss:Width="64.008"/><Row ss:Height="12.816"><Cell><Data ss:Type="String">m</Data></Cell><Cell><Data ss:Type="String">n</Data></Cell></Row><Row ss:Height="12.816"><Cell><Data ss:Type="String">o</Data></Cell><Cell><Data ss:Type="String">p</Data></Cell></Row></Table><x:WorksheetOptions/></ss:Worksheet></Workbook>
================================================
FILE: tests/stream/output-actual.txt
================================================
a
unit separator
b
record separator
c
unit separator
d
group separator
e
unit separator
f
record separator
g
unit separator
h
file separator
i
unit separator
j
record separator
k
unit separator
l
group separator
m
unit separator
n
record separator
o
unit separator
p
================================================
FILE: tests/stream/output-expect.txt
================================================
a
unit separator
b
record separator
c
unit separator
d
group separator
e
unit separator
f
record separator
g
unit separator
h
file separator
i
unit separator
j
record separator
k
unit separator
l
group separator
m
unit separator
n
record separator
o
unit separator
p
================================================
FILE: tests/stream/test.sh
================================================
#!/bin/sh
set -euf
top="$(git rev-parse --show-toplevel)"
cat "$top/examples/stream.usv" | "$top/bin/usv-to-debug.bash" > output-actual.txt
diff output-actual.txt output-expect.txt
================================================
FILE: todo.md
================================================
# TODO
## Shift
For Hierarchy Levels:
* ␏ U+240F Symbol for Shift In (SI).<br>
Use it to shift inward a level, for nesting, blocks, outlines, etc.
* ␎ U+240E Symbol for Shift Out (SO).<br>
Use it to shift outward a level, for nesting, blocks, outlines, etc.
## What is a hierarchy?
Some data projects need more flexibility. For example, some data projects don't fit neatly into units, records, groups, files, because the data contains more kinds of clusters, or has nested clusters, etc.
For these needs, USV enables you to create your own hierarchy. If you know about data representations such as JSON, YAML, TOML, then you already understand how hierarchy works.
Example JSON hierarchy:
```
{
"colors": [
"red",
"green",
"blue"
]
}
```
USV uses two hierarchy characters:
* "shift-in" goes inward a.k.a. begins a deeper hierarchy level.
* "shift-out" goes outward a.k.a. ends a deeper hierarchy level.
USV with a shift-in and a shift-out:
```usv
color␏red␎
```
Pretty print renders shift-in as a left brace, and shift-out as brace, and with indentation:
```txt
color
{
red
}
```
USV with 2 shift ins and 2 shift outs:
```usv
colors␏red␏scarlet␎green␏emerald␎blue␏cerulean␎␎
```
Pretty print renders with even more indentation:
```sh
colors
{
red
{
scarlet
}
green
{
emerald
}
blue
{
cerulean
}
}
```
#!/usr/bin/env bash
set -euf -o pipefail
# USV example shell script that demonstrates the use of USV characters.
# This script reads STDIN one character at a time, and prints text.
escape=false
indent=""
while IFS= read -n1 -r c; do
if [ "$escape" = true ]; then
printf %s "$c"
escape=false
continue
fi
case "$c" in
"␛")
escape=true
;;
"␟")
printf ","
;;
"␞")
printf "\n%s" "$indent"
;;
"␝")
printf "\n%s-\n%s" "$indent" "$indent"
;;
"␜")
printf "\n%s=\n%s" "$indent" "$indent"
;;
"␏")
printf "\n%s{" "$indent"
indent="$indent "
printf "\n%s" "$indent"
;;
"␎")
indent=${indent%????}
printf "\n%s}\n%s" "$indent" "$indent"
;;
"␗")
break
;;
*)
printf %s "$c"
;;
esac
done
printf "\n"
gitextract_r3n42vl8/ ├── CODE_OF_CONDUCT.md ├── README.md ├── bin/ │ ├── bash/ │ │ ├── usv-to-csv.bash │ │ ├── usv-to-debug.bash │ │ └── usv-to-display.bash │ └── python/ │ ├── usv-to-csv.py │ ├── usv-to-debug.py │ └── usv-to-display.py ├── doc/ │ ├── abnf/ │ │ └── index.md │ ├── clap/ │ │ └── index.md │ ├── code/ │ │ └── index.md │ ├── comparisons/ │ │ ├── asv/ │ │ │ └── index.md │ │ ├── csv/ │ │ │ └── index.md │ │ ├── index.md │ │ ├── json/ │ │ │ └── index.md │ │ ├── rsv/ │ │ │ └── index.md │ │ ├── tsv/ │ │ │ └── index.md │ │ └── xlsx/ │ │ └── index.md │ ├── converters/ │ │ └── index.md │ ├── criticisms/ │ │ └── index.md │ ├── editors/ │ │ ├── emacs/ │ │ │ └── index.md │ │ └── vi/ │ │ └── index.md │ ├── end-of-transmission/ │ │ └── index.md │ ├── escape/ │ │ └── index.md │ ├── faq/ │ │ └── index.md │ ├── history-of-ascii-separated-values/ │ │ └── index.md │ ├── how-to-type-unicode-characters/ │ │ └── index.md │ ├── how-to-use-split-and-regex/ │ │ └── index.md │ ├── layout/ │ │ └── index.md │ ├── markup/ │ │ └── index.md │ ├── purpose/ │ │ └── index.md │ ├── rfc/ │ │ ├── draft-unicode-separated-values-01.txt │ │ ├── draft-unicode-separated-values-01.xml │ │ └── index.md │ ├── spacers/ │ │ └── index.md │ ├── styles/ │ │ └── index.md │ └── todo/ │ └── index.md ├── examples/ │ ├── blog-posts.csv │ ├── blog-posts.usv │ ├── end-of-transmission.usv │ ├── hello-goodnight.csv │ ├── hello-goodnight.usv │ ├── stream.usv │ ├── zen-koans.csv │ └── zen-koans.usv ├── tests/ │ ├── 1-dimensional-as-line/ │ │ ├── expect.json │ │ └── input.usv │ ├── 1-dimensional-as-lines/ │ │ ├── expect.json │ │ └── input.usv │ ├── 2-dimensional-as-line/ │ │ ├── expect.json │ │ └── input.usv │ ├── 2-dimensional-as-lines/ │ │ ├── expect.json │ │ └── input.usv │ ├── 3-dimensional-as-line/ │ │ ├── expect.json │ │ └── input.usv │ ├── 3-dimensional-as-lines/ │ │ ├── expect.json │ │ └── input.usv │ ├── 4-dimensional-as-line/ │ │ ├── expect.json │ │ └── input.usv │ ├── 4-dimensional-as-lines/ │ │ ├── expect.json │ │ └── input.usv │ ├── blog-posts/ │ │ ├── output-actual.txt │ │ ├── output-expect.txt │ │ └── test.sh │ ├── end-of-transmission-block/ │ │ ├── output-actual.txt │ │ ├── output-expect.txt │ │ └── test.sh │ ├── libreoffice-calc/ │ │ ├── example1.ods │ │ └── example2.ods │ ├── microsoft-excel/ │ │ ├── example1.xls │ │ ├── example1.xlsx │ │ ├── example2.xls │ │ └── example2.xlsx │ └── stream/ │ ├── output-actual.txt │ ├── output-expect.txt │ └── test.sh └── todo.md
Condensed preview — 77 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (156K chars).
[
{
"path": "CODE_OF_CONDUCT.md",
"chars": 5490,
"preview": "\n# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make particip"
},
{
"path": "README.md",
"chars": 7599,
"preview": "# Unicode Separated Values (USV) ™\n\nUnicode Separated Values (USV) ™ is a data format that uses Unicode characters for m"
},
{
"path": "bin/bash/usv-to-csv.bash",
"chars": 974,
"preview": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that converts USV to CSV.\n#\n# Note this script is a"
},
{
"path": "bin/bash/usv-to-debug.bash",
"chars": 964,
"preview": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that demonstrates the use of USV characters.\n# This"
},
{
"path": "bin/bash/usv-to-display.bash",
"chars": 802,
"preview": "#!/usr/bin/env bash\nset -euf -o pipefail\n\n# USV example shell script that demonstrates the use of USV characters.\n# This"
},
{
"path": "bin/python/usv-to-csv.py",
"chars": 1116,
"preview": "#!/usr/bin/env python3\n\n# USV example shell script that converts USV to CSV.\n#\n# Note this script is a simple demo, and "
},
{
"path": "bin/python/usv-to-debug.py",
"chars": 1193,
"preview": "#!/usr/bin/env python3\n\n# USV example script that demonstrates the use of USV characters.\n# This script reads STDIN one "
},
{
"path": "bin/python/usv-to-display.py",
"chars": 983,
"preview": "#!/usr/bin/env python3\n\n# USV example script that demonstrates the use of USV characters.\n# This script reads STDIN one "
},
{
"path": "doc/abnf/index.md",
"chars": 1493,
"preview": "# Augmented Backus–Naur Form (ABNF)\n\nAugmented Backus–Naur Form (ABNF) grammar-- work in progress.\n\n\n## Semantics\n\n* usv"
},
{
"path": "doc/clap/index.md",
"chars": 1961,
"preview": "# Command line argument parsing (CLAP)\n\nUSV tools should enable users to choose their preferred output style.\n\nUSV tools"
},
{
"path": "doc/code/index.md",
"chars": 1246,
"preview": "# Code\n\nUSV has source code examples and also has production-ready library code.\n\n\n## Script examples with Bash and pyth"
},
{
"path": "doc/comparisons/asv/index.md",
"chars": 1191,
"preview": "# ASCII Separated Values (ASV) a.k.a. DEL (Delimited ASCII)\n\nASCII Separated Values (ASV) uses these invisible zero-widt"
},
{
"path": "doc/comparisons/csv/index.md",
"chars": 4337,
"preview": "# Comma Separated Values (CSV)\n\nComma Separated Values (CSV) uses a comma character to separate values, and a newline ch"
},
{
"path": "doc/comparisons/index.md",
"chars": 2754,
"preview": "# Comparisons with ASV, CSV, TSV, RSV\n\nUnicode separated values (USV) is similar to these formats, plus offers more capa"
},
{
"path": "doc/comparisons/json/index.md",
"chars": 1020,
"preview": "# JavaScript Object Notation (JSON)\n\nJavaScript Object Notation (JSON) is an open standard file format and data intercha"
},
{
"path": "doc/comparisons/rsv/index.md",
"chars": 414,
"preview": "# Rows of String Values (RSV)\n\nhttps://github.com/Stenway/RSV-Specification\n\nThe RSV data file format is a simple binary"
},
{
"path": "doc/comparisons/tsv/index.md",
"chars": 1003,
"preview": "# Tab Separated Values (TSV) a.k.a. Tab Delimited Format (TDF)\n\nTab Separated Values (TSV) uses a tab character to separ"
},
{
"path": "doc/comparisons/xlsx/index.md",
"chars": 1886,
"preview": "# Microsoft Excel (XLSX)\n\nMicrosoft Excel (XLSX) is among the world's most popular spreadsheet programs. It uses a data "
},
{
"path": "doc/converters/index.md",
"chars": 586,
"preview": "# Converters for ASV, CSV, JSON, XSLX\n\nASCII Separated Values (ASV):\n\n* [asv-to-usv](https://crates.io/crate/asv-to-usv)"
},
{
"path": "doc/criticisms/index.md",
"chars": 13883,
"preview": "# Criticisms\n\nUSV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).\n\nConstructive feedback is welcome. See"
},
{
"path": "doc/editors/emacs/index.md",
"chars": 5801,
"preview": "# Emacs notes\n\nC-x = shows a summary about the character at point.\n\nC-u C-x = shows details about the character at point"
},
{
"path": "doc/editors/vi/index.md",
"chars": 434,
"preview": "# vim notes\n\nvim comes with most modern Linux and BSD distributions.\n\n## Digraph characters\n\nTo add digraphs for each US"
},
{
"path": "doc/end-of-transmission/index.md",
"chars": 574,
"preview": "# End of Transmission (EOT)\n\nThe End of Transmission (EOT) mark tells any reader that it can stop reading.\n\n* EOT tells "
},
{
"path": "doc/escape/index.md",
"chars": 522,
"preview": "# Escape (ESC)\n\nThe Escape (ESC) symbol makes the subsequent character treated as a content character.\n\nExample: USV wit"
},
{
"path": "doc/faq/index.md",
"chars": 4379,
"preview": "# Frequently Asked Questions\n\nUSV is led by Joel Parker Henderson (joel@joelparkerhenderson.com).\n\nConstructive feedback"
},
{
"path": "doc/history-of-ascii-separated-values/index.md",
"chars": 2324,
"preview": "# History of ASCII separated values (ASV)\n\n➤ <https://www.lammertbies.nl/comm/info/ascii-characters>\n\n\n## ASCII 28 = FS "
},
{
"path": "doc/how-to-type-unicode-characters/index.md",
"chars": 636,
"preview": "# How to type Unicode characters\n\nOn many systems, you can type Unicode characters this way:\n\n1. Press and hold the Alt "
},
{
"path": "doc/how-to-use-split-and-regex/index.md",
"chars": 995,
"preview": "# How to use split and regex\n\nTo use split and regex, rather than a specific USV parsing tool or library, then you have "
},
{
"path": "doc/layout/index.md",
"chars": 773,
"preview": "# Layout\n\nUSV styles can customize various kinds of output so it looks like you prefer.\n\n* Layout 0: Show each item with"
},
{
"path": "doc/markup/index.md",
"chars": 1092,
"preview": "# USV markup\n\nUSV uses Unicode characters for data markup.\n\n* <tt>[U+001F](https://codepoints.net/U+001F)/[U+241F](https"
},
{
"path": "doc/purpose/index.md",
"chars": 2137,
"preview": "# USV purpose\n\nThe USV purpose is to help people edit data, share data, and manage data.\n\n* Edit data by using plain tex"
},
{
"path": "doc/rfc/draft-unicode-separated-values-01.txt",
"chars": 22712,
"preview": "\n\n\n\nInternet Engineering Task Force J. Henderson, Ed.\nInternet-Draft "
},
{
"path": "doc/rfc/draft-unicode-separated-values-01.xml",
"chars": 29498,
"preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<!--\n\n draft-unicode-separated-values-01\n\n Based on draft-rfcxml-general-templa"
},
{
"path": "doc/rfc/index.md",
"chars": 619,
"preview": "# Request For Comments (RFC)\n\nUSV is aiming to be an international standard with the IETF and IANA.\n\nWork in progress:\n\n"
},
{
"path": "doc/spacers/index.md",
"chars": 1296,
"preview": "# Spacers\n\nSpacers are characters that have the Unicode Derived Core Property White_Space.\n\nExamples:\n\n* U+0020 Space (S"
},
{
"path": "doc/styles/index.md",
"chars": 343,
"preview": "# Styles\n\nUSV styles can customize various kinds of output so it looks like you prefer.\n\n* Symbols: characters are visib"
},
{
"path": "doc/todo/index.md",
"chars": 1535,
"preview": "# TODO list\n\nWe welcome help with this todo list.\n\n\n## Add formats\n\nAdd USV formats to productivity applications:\n\n* [ ]"
},
{
"path": "examples/blog-posts.csv",
"chars": 706,
"preview": "\"Title One\",\"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolo"
},
{
"path": "examples/blog-posts.usv",
"chars": 709,
"preview": "Title One\n␟\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut labore et dolor"
},
{
"path": "examples/end-of-transmission.usv",
"chars": 108,
"preview": "a␟b␟c␟␄\n\nThe End of Transmission (EOT) stops parsing.\nFor example, this text comes after the EOT character.\n"
},
{
"path": "examples/hello-goodnight.csv",
"chars": 55,
"preview": "\"I say \"\"hello, world\"\"\"\n\"You say \"\"goodnight, moon\"\"\"\n"
},
{
"path": "examples/hello-goodnight.usv",
"chars": 51,
"preview": "I say \"hello, world\"␟␞\nYou say \"goodnight, moon\"␟␞\n"
},
{
"path": "examples/stream.usv",
"chars": 46,
"preview": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜"
},
{
"path": "examples/zen-koans.csv",
"chars": 252,
"preview": "\"Truth Koan\",\"A monk asked, \"\"Without words or silence, will you tell me the truth?\"\"\"\n\"Lotus Koan\",\"A child asked, \"\"Be"
},
{
"path": "examples/zen-koans.usv",
"chars": 240,
"preview": "Truth Koan␟A monk asked, \"Without words or silence, will you tell me the truth?\"␟␞\nLotus Koan␟A child asked, \"Before the"
},
{
"path": "tests/1-dimensional-as-line/expect.json",
"chars": 9,
"preview": "[\"a\",\"b\"]"
},
{
"path": "tests/1-dimensional-as-line/input.usv",
"chars": 4,
"preview": "a␟b␟"
},
{
"path": "tests/1-dimensional-as-lines/expect.json",
"chars": 9,
"preview": "[\"a\",\"b\"]"
},
{
"path": "tests/1-dimensional-as-lines/input.usv",
"chars": 8,
"preview": "a\n␟\nb\n␟\n"
},
{
"path": "tests/2-dimensional-as-line/expect.json",
"chars": 21,
"preview": "[[\"a\",\"b\"],[\"c\",\"d\"]]"
},
{
"path": "tests/2-dimensional-as-line/input.usv",
"chars": 10,
"preview": "a␟b␟␞c␟d␟␞"
},
{
"path": "tests/2-dimensional-as-lines/expect.json",
"chars": 21,
"preview": "[[\"a\",\"b\"],[\"c\",\"d\"]]"
},
{
"path": "tests/2-dimensional-as-lines/input.usv",
"chars": 20,
"preview": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n"
},
{
"path": "tests/3-dimensional-as-line/expect.json",
"chars": 45,
"preview": "[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]]"
},
{
"path": "tests/3-dimensional-as-line/input.usv",
"chars": 22,
"preview": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝"
},
{
"path": "tests/3-dimensional-as-lines/expect.json",
"chars": 45,
"preview": "[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]]"
},
{
"path": "tests/3-dimensional-as-lines/input.usv",
"chars": 44,
"preview": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n␝\ne\n␟\nf\n␟\n␞\ng\n␟\nh\n␟\n␞\n␝\n"
},
{
"path": "tests/4-dimensional-as-line/expect.json",
"chars": 93,
"preview": "[[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]],[[[\"i\",\"j\"],[\"k\",\"l\"]],[[\"m\",\"n\"],[\"o\",\"p\"]]]]"
},
{
"path": "tests/4-dimensional-as-line/input.usv",
"chars": 46,
"preview": "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜"
},
{
"path": "tests/4-dimensional-as-lines/expect.json",
"chars": 93,
"preview": "[[[[\"a\",\"b\"],[\"c\",\"d\"]],[[\"e\",\"f\"],[\"g\",\"h\"]]],[[[\"i\",\"j\"],[\"k\",\"l\"]],[[\"m\",\"n\"],[\"o\",\"p\"]]]]"
},
{
"path": "tests/4-dimensional-as-lines/input.usv",
"chars": 92,
"preview": "a\n␟\nb\n␟\n␞\nc\n␟\nd\n␟\n␞\n␝\ne\n␟\nf\n␟\n␞\ng\n␟\nh\n␟\n␞\n␝\n␜\ni\n␟\nj\n␟\n␞\nk\n␟\nl\n␟\n␞\n␝\nm\n␟\nn\n␟\n␞\no\n␟\np\n␟\n␞\n␝\n␜\n"
},
{
"path": "tests/blog-posts/output-actual.txt",
"chars": 784,
"preview": "Title One\n\nunit separator\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut "
},
{
"path": "tests/blog-posts/output-expect.txt",
"chars": 784,
"preview": "Title One\n\nunit separator\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut "
},
{
"path": "tests/blog-posts/test.sh",
"chars": 185,
"preview": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/blog-posts.usv\" | \"$top/bin/usv-to-debug.ba"
},
{
"path": "tests/end-of-transmission-block/output-actual.txt",
"chars": 57,
"preview": "a\nunit separator\nb\nunit separator\nc\nEnd of Transmission\n\n"
},
{
"path": "tests/end-of-transmission-block/output-expect.txt",
"chars": 57,
"preview": "a\nunit separator\nb\nunit separator\nc\nEnd of Transmission\n\n"
},
{
"path": "tests/end-of-transmission-block/test.sh",
"chars": 194,
"preview": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/end-of-transmission.usv\" | \"$top/bin/usv-to"
},
{
"path": "tests/microsoft-excel/example1.xls",
"chars": 4693,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?mso-application progid=\"Excel.Sheet\"?><Workbook xmlns=\"urn:schemas-microsoft-co"
},
{
"path": "tests/microsoft-excel/example2.xls",
"chars": 4693,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?mso-application progid=\"Excel.Sheet\"?><Workbook xmlns=\"urn:schemas-microsoft-co"
},
{
"path": "tests/stream/output-actual.txt",
"chars": 267,
"preview": "a\nunit separator\nb\nrecord separator\nc\nunit separator\nd\ngroup separator\ne\nunit separator\nf\nrecord separator\ng\nunit separa"
},
{
"path": "tests/stream/output-expect.txt",
"chars": 267,
"preview": "a\nunit separator\nb\nrecord separator\nc\nunit separator\nd\ngroup separator\ne\nunit separator\nf\nrecord separator\ng\nunit separa"
},
{
"path": "tests/stream/test.sh",
"chars": 181,
"preview": "#!/bin/sh\nset -euf\ntop=\"$(git rev-parse --show-toplevel)\"\ncat \"$top/examples/stream.usv\" | \"$top/bin/usv-to-debug.bash\" "
},
{
"path": "todo.md",
"chars": 2358,
"preview": "# TODO\n\n## Shift\n\nFor Hierarchy Levels:\n\n* ␏ U+240F Symbol for Shift In (SI).<br>\n Use it to shift inward a level, for "
}
]
// ... and 4 more files (download for full content)
About this extraction
This page contains the full source code of the SixArm/usv GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 77 files (140.5 KB), approximately 42.4k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.