Repository: uuid6/uuid6-ietf-draft Branch: master Commit: 35ec1548920d Files: 26 Total size: 1.4 MB Directory structure: gitextract_2x_pp6_7/ ├── .github/ │ └── ISSUE_TEMPLATE/ │ └── proposed-draft-change.md ├── .gitignore ├── README.md ├── draft-peabody-dispatch-new-uuid-format-04.html ├── draft-peabody-dispatch-new-uuid-format-04.txt ├── draft-peabody-dispatch-new-uuid-format-04.xml ├── editor-notes/ │ ├── LATEST-OUTLINE.md │ └── LATEST.md ├── index.html ├── old drafts/ │ ├── .gitignore │ ├── README.md │ ├── draft-peabody-dispatch-new-uuid-format-00.txt │ ├── draft-peabody-dispatch-new-uuid-format-00.xml │ ├── draft-peabody-dispatch-new-uuid-format-01.html │ ├── draft-peabody-dispatch-new-uuid-format-01.txt │ ├── draft-peabody-dispatch-new-uuid-format-01.xml │ ├── draft-peabody-dispatch-new-uuid-format-02.html │ ├── draft-peabody-dispatch-new-uuid-format-02.txt │ ├── draft-peabody-dispatch-new-uuid-format-02.xml │ ├── draft-peabody-dispatch-new-uuid-format-03.html │ ├── draft-peabody-dispatch-new-uuid-format-03.txt │ ├── draft-peabody-dispatch-new-uuid-format-03.xml │ ├── draft-peabody-dispatch-new-uuid-format.txt │ └── draft-peabody-dispatch-new-uuid-format.xml └── research/ ├── sortable-id-analysis.md └── sortable-id-comparisons.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/ISSUE_TEMPLATE/proposed-draft-change.md ================================================ --- name: Proposed Draft Change about: XML Draft Changes title: '' labels: Change Proposal assignees: kyzer-davis --- # Change Proposal Template #### Source (Select one.) - [ ] IETF Published Draft - [ ] Work in Progress Draft #### Change Reason (Select all that apply.) - [ ] Typos and grammatical issues - [ ] Bad Reference - [ ] IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc) - [ ] New Text for additional context - [ ] Underlying XML Format Update - [ ] ASCII diagram updates (artwork, code samples, etc.) #### Draft Number, Full Section, Name ``` Draft 02, Section 5.1. Timestamp Granularity ``` #### Current Text: ``` UUIDs MAY be universally unique. ``` #### Proposed Text: ``` UUIDs SHOULD be universally unique. ``` --- ### Other Supporting information below: Insert data here ================================================ FILE: .gitignore ================================================ misc-notes.txt ================================================ FILE: README.md ================================================ # Final ``` RFC 9562 Title: Universally Unique IDentifiers (UUIDs) Author: K. Davis, B. Peabody, P. Leach Status: Standards Track Stream: IETF Date: May 2024 Mailbox: kydavis@cisco.com, brad@peabody.io, pjl7@uw.edu Pages: 46 Obsoletes: RFC 4122 I-D Tag: draft-ietf-uuidrev-rfc4122bis-14.txt URL: https://www.rfc-editor.org/info/rfc9562 DOI: 10.17487/RFC9562 ``` # Updates - This work has been officially adopted by the IETF! - Work on the base document has been reset to Draft 00 and moved to https://github.com/ietf-wg-uuidrev/rfc4122bis - Full Details and further updates will be posted in [Post-IETF 114 and the future of this Draft](https://github.com/uuid6/uuid6-ietf-draft/issues/122) ``` Name: draft-ietf-uuidrev-rfc4122bis Revision: 14 Title: Universally Unique IDentifiers (UUID) Date: 2023-11-06 Group: uuidrev Pages: 58 URL: https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.txt Status: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/ HTML: https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.html HTMLized: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis Diff: https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-14 ``` --- # New UUID Formats This is the GitHub repo for the IETF draft surrounding the topic of new UUID formats. Various discussion will need to occur to arrive at a standard and this repo will be used to collect and organize that information. ## High Level Overview 1. **UUID version 6**: A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. `time_high|time_mid|time_low_and_version|clk_seq_hi_res|clk_seq_low|node` 2. **UUID version 7**: An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. `unix_ts_ms|ver|rand_a|var|rand_b` 3. **UUID version 8**: A free-form UUID format which has no explicit requirements except maintaining backward compatibility. `custom_a|ver|custom_b|var|custom_c` 5. **Max UUID**: A specialized UUID which is the inverse of the Nil UUID from RFC4122. `FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF` --- # RFC Scope In order to keep things on track the following topics have been decided as in-scope or out of scope for this particular RFC. For more information on any of these items refer to the XML, TXT, HTML draft, research and the issue tracker for a particular discussion (follow hyperlinks below.) ### In Scope Topics - UUID Generation - [Timestamp Granularity](https://github.com/uuid6/uuid6-ietf-draft/issues/23) - Sub-Topics: Timestamp epoch source, format, length, accuracy and bit layout - [Monotonicity and Counters for same Timestamp-tick collision avoidance during batch UUID creation](https://github.com/uuid6/uuid6-ietf-draft/issues/60) - Sub-Topics: Counter position, length, rollover handling and seeding. - [Pseudo-random formatting, length and generation methods](https://github.com/uuid6/uuid6-ietf-draft/issues/55) - Distributed UUID Generation best practices - Sub-Topics: [Shared Knowledge Schemes and embedded nondescript node identifiers](https://github.com/uuid6/uuid6-ietf-draft/issues/36) - [Max UUID Usage](https://github.com/uuid6/uuid6-ietf-draft/issues/62) ### In Scope Topics - UUID Best Practices as it relates to the previous topics - Global and Local Uniqueness (collision resistance mechanisms) - Unguessability - Sorting/Ordering techniques - Storage and Opacity best practices - Big Endian vs Little Endian bit layout - Any and all UUID security concerns! - Sub-Topics: [MAC address usage in next-generation UUIDs](https://github.com/uuid6/uuid6-ietf-draft/issues/13) --- ### Out of Scope Topics (Rolled into a new Draft that can be found here: [New UUID Encoding Techniques](https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft)) - [URN Modifications](https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/issues/4) - [Alternative text encoding techniques (Crockfords Base32, Base64, etc)](https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/issues/3) - [Variable length UUIDs | UUID Long](https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/issues/2) ### Out of Scope Topics - [Variant Bit E Usage](https://github.com/uuid6/uuid6-ietf-draft/issues/26) --- ### Out of Scope Topics (as as the result of discussion threads) - [Variable length subsecond precision encoding](https://github.com/uuid6/uuid6-ietf-draft/issues/24) ### Out of Scope Topics (for backwards compatibility) - Changing the default 8-4-4-4-12 UUID text layout - Changing anything about RFC4122's UUID versions 1 through 5 - [Changing too much about UUIDv6 that would otherwise inhibit porting v1 to v6](https://github.com/uuid6/uuid6-ietf-draft/issues/52) --- # Contributing - The XML draft in the root folder is the most recent working draft for re-submission to the IETF. - An HTML and Textual (.txt) RFC representation will be provided in the root folder to ease reader input and discussion. - [Older drafts](https://github.com/uuid6/uuid6-ietf-draft/tree/master/old%20drafts) are available for view here or on the [IETF Datatracker](https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/). - The RFC Draft utilize an XML formatted document that follows [RFC7742 markup](https://xml2rfc.tools.ietf.org/rfc7749.html). All XML changes MUST follow this format and pass conversion to `.txt` and `.html` via https://xml2rfc.tools.ietf.org/ - Utilize the issue tracker to discuss topics, solutions, problems, typos and anything else. - Where possible contribute to an existing [Discussion Thread](https://github.com/uuid6/uuid6-ietf-draft/issues?q=is%3Aissue+is%3Aopen+label%3ADiscussion) vs creating a new thread. - Reviewing is the pre-Draft 01 [Research efforts](https://github.com/uuid6/uuid6-ietf-draft/tree/master/research) is encouraged before diving into discussion threads. - New threads that propose alternative text SHOULD utilize `Proposed Draft Change` GitHub issue template to ensure proper information is captured for the draft authors. - Be civil! - Pull requests will be accepted *as long as the text is concise, clear and objective.* - PRs will not be accepted for changes to the decision made for the draft without full discussion. - PRs MUST include the updated `.xml` and xml2rfc generated `.txt` and `.html` documents. - Draft versions are frozen until submission to the IETF; at which point new work constitutes a new draft version. --- # Prototyping Remember first and foremost that this specification is still a draft. Breaking changes are to be expected. Prototypes SHOULD only be implemented to verify or discredit topics of the draft text. - [Prototype Implementations](https://github.com/uuid6/prototypes) are available via this repro. ================================================ FILE: draft-peabody-dispatch-new-uuid-format-04.html ================================================ New UUID Formats
Internet-Draft new-uuid-format June 2022
Peabody & Davis Expires 25 December 2022 [Page]
Workgroup:
dispatch
Internet-Draft:
draft-peabody-dispatch-new-uuid-format-04
Updates:
4122 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
BGP. Peabody
K. Davis

New UUID Formats

Abstract

This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 25 December 2022.

Table of Contents

1. Introduction

Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.

One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:

  1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.

  2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754].

  3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.

  4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.

  5. Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.

  6. [RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.

Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.

While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.

  1. [ULID] by A. Feerasta

  2. [LexicalUUID] by Twitter

  3. [Snowflake] by Twitter

  4. [Flake] by Boundary

  5. [ShardingID] by Instagram

  6. [KSUID] by Segment

  7. [Elasticflake] by P. Pearcy

  8. [FlakeID] by T. Pawlak

  9. [Sonyflake] by Sony

  10. [orderedUuid] by IT. Cabrera

  11. [COMBGUID] by R. Tallent

  12. [SID] by A. Chilton

  13. [pushID] by Google

  14. [XID] by O. Poitrey

  15. [ObjectID] by MongoDB

  16. [CUID] by E. Elliott

An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.

2. Terminology

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2. Abbreviations

The following abbreviations are used in this document:

UUID
Universally Unique Identifier [RFC4122]
CSPRNG
Cryptographically Secure Pseudo-Random Number Generator
MAC
Media Access Control
MSB
Most Significant Bit
DBMS
Database Management System

3. Summary of Changes

The following UUIDs are hereby introduced:

UUID version 6 (UUIDv6)
A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See Section 5.1
UUID version 7 (UUIDv7)
An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See Section 5.2
UUID version 8 (UUIDv8)
A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See Section 5.3
Max UUID
A specialized UUID which is the inverse of [RFC4122], Section 4.1.7 See Section 5.4

3.1. changelog

RFC EDITOR PLEASE DELETE THIS SECTION.

draft-04

  • - Fixed bad title in IEEE754 Normative Reference

  • - Fixed bad GMT offset in Test Vector Appendix

  • - Removed MAY in Counters section

  • - Condensed Counter Type into Counter Methods to reduce text

  • - Removed option for random increment along with fixed-length counter

  • - Described how to handle scenario where New UUID less than Old UUID

  • - Allow timestamp increment if counter overflows

  • - Replaced UUIDv8 C code snippet with full generation example

  • - Fixed RFC4086 Reference link

  • - Describe reseeding best practice for CSPRNG

  • - Changed MUST to SHOULD removing requirement for absolute monotonicity

draft-03

  • - Reworked the draft body to make the content more concise

  • - UUIDv6 section reworked to just the reorder of the timestamp

  • - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp

  • - UUIDv8 relaxed to be custom in all elements except version and variant

  • - Introduced Max UUID.

  • - Added C code samples in Appendix.

  • - Added test vectors in Appendix.

  • - Version and Variant section combined into one section.

  • - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).

  • - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.

  • - Split Encoding and Storage into Opacity and DBMS and Database Considerations

  • - Reworked Global Uniqueness under new section Global and Local Uniqueness

  • - Node verbiage only used in UUIDv6 all others reference random/rand instead

  • - Clock sequence verbiage changed simply to counter in any section other than UUIDv6

  • - Added Abbreviations section

  • - Updated IETF Draft XML Layout

  • - Added information about little-endian UUIDs

draft-02

  • - Added Changelog

  • - Fixed misc. grammatical errors

  • - Fixed section numbering issue

  • - Fixed some UUIDvX reference issues

  • - Changed all instances of "motonic" to "monotonic"

  • - Changed all instances of "#-bit" to "# bit"

  • - Changed "proceeding" verbiage to "after" in section 7

  • - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7

  • - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7

  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.

  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4

draft-01

  • - Complete rewrite of entire document.

  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.

  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.

  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)

  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)

4. Variant and Version Fields

The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1.

The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1.

Table 1: New UUID variant 10xx (8/9/A/B) versions defined by this specification
Msb0 Msb1 Msb2 Msb3 Version Description
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document

For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.

00000000-0000-6000-8000-000000000000
00000000-0000-6000-9000-000000000000
00000000-0000-6000-A000-000000000000
00000000-0000-6000-B000-000000000000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Figure 1: UUIDv6 Variant Examples

5. New Formats

The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.

5.1. UUID Version 6

UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead.

Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp.

The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5.

The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8

The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           time_high                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           time_mid            |      time_low_and_version     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         node (2-5)                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: UUIDv6 Field and Bit Layout
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)

With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.

5.2. UUID Version 7

UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6.

Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: UUIDv7 Field and Bit Layout
unix_ts_ms:
48 bit big-endian unsigned number of Unix epoch timestamp as per Section 6.1.
ver:
4 bit UUIDv7 version set as per Section 4
rand_a:
12 bits pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.
var:
The 2 bit variant defined by Section 4.
rand_b:
The final 62 bits of pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.

5.3. UUID Version 8

UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed.

The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data.

Some example situations in which UUIDv8 usage could occur:

  • An implementation would like to embed extra information within the UUID other than what is defined in this document.

  • An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_a                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          custom_a             |  ver  |       custom_b        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                       custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: UUIDv8 Field and Bit Layout
custom_a:
The first 48 bits of the layout that can be filled as an implementation sees fit.
ver:
The 4 bit version field as defined by Section 4
custom_b:
12 more bits of the layout that can be filled as an implementation sees fit.
var:
The 2 bit variant field as defined by Section 4.
custom_c:
The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit.

5.4. Max UUID

The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7

FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF
Figure 5: Max UUID Format

6. UUID Best Practices

The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.

6.1. Timestamp Granularity

UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.

Reliability:
Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)
Source:
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged.
Sub-second Precision and Accuracy:
Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized.
Length:
The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp.
Altering, Fuzzing, or Smearing:
Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time.
Padding:
When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7.
Truncating:
Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7.

6.2. Monotonicity and Counters

Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section.

Additionally, care SHOULD be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp.

For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3

Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.

Fixed-Length Dedicated Counter Bits (Method 1):
This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 SHOULD be utilized as fixed-length dedicated counter bits that are incremented by one for every UUID generation. The trailing random bits generated for each new UUID in rand_b can help produce unguessable UUIDs. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits.
Monotonic Random (Method 2):
With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero. It ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy. The increment value MAY be one when the amount of UUIDs generated in a particular period of time is important and guessability is not an issue. However, it SHOULD NOT be used by implementations that favor unguessiblity, as the resulting values are easily guessable.

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

Fixed-Length Dedicated Counter Seeding:
Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers.
Fixed-Length Dedicated Counter Length:
Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.

The following sub-topics cover rollover handling with either type of counter method:

Counter Rollover Guards:
The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also helpful to mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding.
Counter Rollover Handling:
Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.

Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:

  1. Compare the current timestamp against the previously stored timestamp.

  2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method.

  3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).

Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous. To handle this scenario, the general guidance is that application MAY reuse the previous timestamp and increment the previous counter method.

6.3. Distributed UUID Generation

Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments.

Centralized Registry:
With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification.
Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per Section 8. The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification.

Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement.

6.4. Collision Resistance

Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3.

There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.

Low Impact
A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format.
High Impact:
A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context.

6.5. Global and Local Uniqueness

UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required.

Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122].

6.6. Unguessability

Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation. CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs.

Advice on generating cryptographic-quality random numbers can be found in [RFC4086]

6.7. Sorting

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection.

Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large.

UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation.

UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.

6.8. Opacity

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

6.9. DBMS and Database Considerations

For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.

For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:

  • Storing as binary requires less space and may result in faster data access.

  • Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.

DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.

7. IANA Considerations

This document has no IANA actions.

8. Security Considerations

MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information.

Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.

9. Acknowledgements

The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.

10. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.
[RFC4086]
Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", RFC 4086, DOI 10.17487/RFC4086, , <https://www.rfc-editor.org/info/rfc4086>.

11. Informative References

[LexicalUUID]
Twitter, "A Scala client for Cassandra", commit f6da4e0, , <https://github.com/twitter-archive/cassie>.
[Snowflake]
Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, , <https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010>.
[Flake]
Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, , <https://github.com/boundary/flake>.
[ShardingID]
Instagram Engineering, "Sharding & IDs at Instagram", , <https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c>.
[KSUID]
Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, , <https://github.com/segmentio/ksuid>.
[Elasticflake]
Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, , <https://github.com/ppearcy/elasticflake>.
[FlakeID]
Pawlak, T., "Flake ID Generator", Commit fcd6a2f, , <https://github.com/T-PWK/flake-idgen>.
[Sonyflake]
Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, , <https://github.com/sony/sonyflake>.
[orderedUuid]
Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", , <https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8>.
[COMBGUID]
Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, , <https://github.com/richardtallent/RT.Comb>.
[ULID]
Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, , <https://github.com/ulid/spec>.
[SID]
Chilton, A., "sid : generate sortable identifiers", Commit 660e947, , <https://github.com/chilts/sid>.
[pushID]
Google, "The 2^120 Ways to Ensure Unique Identifiers", , <https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html>.
[XID]
Poitrey, O., "Globally Unique ID Generator", Commit efa678f, , <https://github.com/rs/xid>.
[ObjectID]
MongoDB, "ObjectId - MongoDB Manual", <https://docs.mongodb.com/manual/reference/method/ObjectId/>.
[CUID]
Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, , <https://github.com/ericelliott/cuid>.
[IEEE754]
IEEE, "IEEE Standard for Floating-Point Arithmetic.", Series 754-2019, , <https://standards.ieee.org/ieee/754/6210/>.

Appendix A. Example Code

A.1. Creating a UUIDv6 Value

This section details a function in C which converts from a UUID version 1 to version 6:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>
#include <uuid/uuid.h>

/* Converts UUID version 1 to version 6 in place. */
void uuidv1tov6(uuid_t u) {

  uint64_t ut;
  unsigned char *up = (unsigned char *)u;

  // load ut with the first 64 bits of the UUID
  ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32;
  ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4])));

  // dance the bit-shift...
  ut =
    ((ut >> 32) & 0x0FFF) | // 12 least significant bits
    (0x6000) | // version number
    ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits
    ((ut << 20) & 0x000FFFF000000000) | // next 16 bits
    (ut << 52); // 12 most significant bits

  // store back in UUID
  *((uint32_t*)up) = htonl((uint32_t)(ut >> 32));
  *((uint32_t*)&up[4]) = htonl((uint32_t)(ut));

}
Figure 6: UUIDv6 Function in C

A.2. Creating a UUIDv7 Value

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
    printf("fopen /dev/urandom error\n");
    return 1;
}

// ...

// generate one UUIDv7E
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
    printf("clock_gettime error: %d\n", ret);
    return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0
u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1
Figure 7: UUIDv7 Function in C

A.3. Creating a UUIDv8 Value

UUIDv8 will vary greatly from implementation to implementation.

The following example utilizes:

  • 32 bit custom-epoch timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC)

  • 16 bit exotic resolution (~15 microsecond) subsecond timestamp encoded using the fractional representation

  • 58 bit random number

  • 8 bit application-specific unique node ID

  • 8 bit rolling sequence number

#include <stdint.h>
#include <time.h>

int get_random_bytes(uint8_t *buffer, int count) {
  // ...
}

int generate_uuidv8(uint8_t *uuid, uint8_t node_id) {
  struct timespec tp;
  if (clock_gettime(CLOCK_REALTIME, &tp) != 0)
    return -1; // real-time clock error

  // 32 bit biased timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC)
  uint32_t timestamp_sec = tp.tv_sec - 1577836800;
  uuid[0] = timestamp_sec >> 24;
  uuid[1] = timestamp_sec >> 16;
  uuid[2] = timestamp_sec >> 8;
  uuid[3] = timestamp_sec;

  // 16 bit subsecond fraction (~15 microsecond resolution)
  uint16_t timestamp_subsec = ((uint64_t)tp.tv_nsec << 16) / 1000000000;
  uuid[4] = timestamp_subsec >> 8;
  uuid[5] = timestamp_subsec;

  // 58 bit random number and required ver and var fields
  if (get_random_bytes(&uuid[6], 8) != 0)
    return -1; // random number generator error
  uuid[6] = 0x80 | (uuid[6] & 0x0f);
  uuid[8] = 0x80 | (uuid[8] & 0x3f);

  // 8 bit application-specific node ID to guarantee application-wide uniqueness
  uuid[14] = node_id;

  // 8 bit rolling sequence number to help ensure process-wide uniqueness
  static uint8_t sequence = 0;
  uuid[15] = sequence++; // NOTE: unprotected from race conditions

  return 0;
}
Figure 8: UUIDv8 Function in C

Appendix B. Test Vectors

Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00

Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.

# Unix Nanosecond precision to Gregorian 100-nanosecond intervals
gregorian_100_ns = (Unix_64_bit_nanoseconds / 100) + gregorian_Unix_offset

# Gregorian to Unix Offset:
# The number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
# gregorian_Unix_offset = 0x01b21dd213814000 or 122192928000000000

# Unix 64 bit Nanosecond Timestamp:
# Unix NS: Tuesday, February 22, 2022 2:22:22 PM GMT-05:00
# Unix_64_bit_nanoseconds = 0x16D6320C3D4DCC00 or 1645557742000000000

# Work:
# gregorian_100_ns = (1645557742000000000 / 100) + 122192928000000000
# (138648505420000000 - 122192928000000000) * 100 = Unix_64_bit_nanoseconds

# Final:
# gregorian_100_ns = 0x1EC9414C232AB00 or 138648505420000000

# Original: 000111101100100101000001010011000010001100101010101100000000
# UUIDv1:   11000010001100101010101100000000|1001010000010100|0001|000111101100
# UUIDv6:   00011110110010010100000101001100|0010001100101010|0110|101100000000
Figure 9: Test Vector Timestamp Pseudo-code

B.1. Example of a UUIDv6 Value

----------------------------------------------
field                 bits    value_hex
----------------------------------------------
time_low              32      0xC232AB00
time_mid              16      0x9414
time_hi_and_version   16      0x11EC
clk_seq_hi_res         8      0xB3
clock_seq_low          8      0xC8
node                  48      0x9E6BDECED846
----------------------------------------------
total                128
----------------------------------------------
final_hex: C232AB00-9414-11EC-B3C8-9E6BDECED846
Figure 10: UUIDv1 Example Test Vector
-----------------------------------------------
field                 bits    value_hex
-----------------------------------------------
time_high              32      0x1EC9414C
time_mid               16      0x232A
time_low_and_version   16      0x6B00
clk_seq_hi_res          8      0xB3
clock_seq_low           8      0xC8
node                   48      0x9E6BDECED846
-----------------------------------------------
total                 128
-----------------------------------------------
final_hex: 1EC9414C-232A-6B00-B3C8-9E6BDECED846
Figure 11: UUIDv6 Example Test Vector

B.2. Example of a UUIDv7 Value

This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits

rand_a and rand_b are filled with random data.

The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F22E279B0 or 1645557742000

-------------------------------
field      bits    value
-------------------------------
unix_ts_ms   48    0x17F22E279B0
var           4    0x7
rand_a       12    0xCC3
var           2    b10
rand_b       62    0x18C4DC0C0C07398F
-------------------------------
total       128
-------------------------------
final: 017F22E2-79B0-7CC3-98C4-DC0C0C07398F
Figure 12: UUIDv7 Example Test Vector

B.3. Example of a UUIDv8 Value

This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version.

The next two segments of custom_b and custom_c are are filled with random data.

Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000

It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.

-------------------------------
field      bits    value
-------------------------------
custom_a     48    0x320C3D4DCC00
ver           4    0x8
custom_b     12    0x75B
var           2    b10
custom_c     62    0xEC932D5F69181C0
-------------------------------
total       128
-------------------------------
final: 320C3D4D-CC00-875B-8EC9-32D5F69181C0
Figure 13: UUIDv8 Example Test Vector

Appendix C. Version and Variant Tables

C.1. Variant 10xx Versions

Table 2: All UUID variant 10xx (8/9/A/B) version definitions.
Msb0 Msb1 Msb2 Msb3 Version Description
0 0 0 0 0 Unused
0 0 0 1 1 The Gregorian time-based UUID from in [RFC4122], Section 4.1.3
0 0 1 0 2 DCE Security version, with embedded POSIX UIDs from [RFC4122], Section 4.1.3
0 0 1 1 3 The name-based version specified in [RFC4122], Section 4.1.3 that uses MD5 hashing.
0 1 0 0 4 The randomly or pseudo-randomly generated version specified in [RFC4122], Section 4.1.3.
0 1 0 1 5 The name-based version specified in [RFC4122], Section 4.1.3 that uses SHA-1 hashing.
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document.
1 0 0 1 9 Reserved for future definition.
1 0 1 0 10 Reserved for future definition.
1 0 1 1 11 Reserved for future definition.
1 1 0 0 12 Reserved for future definition.
1 1 0 1 13 Reserved for future definition.
1 1 1 0 14 Reserved for future definition.
1 1 1 1 15 Reserved for future definition.

Authors' Addresses

Brad G. Peabody
Kyzer R. Davis
================================================ FILE: draft-peabody-dispatch-new-uuid-format-04.txt ================================================ dispatch BGP. Peabody Internet-Draft Updates: 4122 (if approved) K. Davis Intended status: Standards Track 23 June 2022 Expires: 25 December 2022 New UUID Formats draft-peabody-dispatch-new-uuid-format-04 Abstract This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 25 December 2022. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Peabody & Davis Expires 25 December 2022 [Page 1] Internet-Draft new-uuid-format June 2022 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 5 3. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 5 3.1. changelog . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Variant and Version Fields . . . . . . . . . . . . . . . . . 7 5. New Formats . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1. UUID Version 6 . . . . . . . . . . . . . . . . . . . . . 8 5.2. UUID Version 7 . . . . . . . . . . . . . . . . . . . . . 10 5.3. UUID Version 8 . . . . . . . . . . . . . . . . . . . . . 11 5.4. Max UUID . . . . . . . . . . . . . . . . . . . . . . . . 12 6. UUID Best Practices . . . . . . . . . . . . . . . . . . . . . 12 6.1. Timestamp Granularity . . . . . . . . . . . . . . . . . . 12 6.2. Monotonicity and Counters . . . . . . . . . . . . . . . . 14 6.3. Distributed UUID Generation . . . . . . . . . . . . . . . 17 6.4. Collision Resistance . . . . . . . . . . . . . . . . . . 18 6.5. Global and Local Uniqueness . . . . . . . . . . . . . . . 18 6.6. Unguessability . . . . . . . . . . . . . . . . . . . . . 19 6.7. Sorting . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.8. Opacity . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.9. DBMS and Database Considerations . . . . . . . . . . . . 19 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 10. Normative References . . . . . . . . . . . . . . . . . . . . 20 11. Informative References . . . . . . . . . . . . . . . . . . . 21 Appendix A. Example Code . . . . . . . . . . . . . . . . . . . . 23 A.1. Creating a UUIDv6 Value . . . . . . . . . . . . . . . . . 23 A.2. Creating a UUIDv7 Value . . . . . . . . . . . . . . . . . 23 A.3. Creating a UUIDv8 Value . . . . . . . . . . . . . . . . . 25 Appendix B. Test Vectors . . . . . . . . . . . . . . . . . . . . 26 B.1. Example of a UUIDv6 Value . . . . . . . . . . . . . . . . 27 B.2. Example of a UUIDv7 Value . . . . . . . . . . . . . . . . 28 B.3. Example of a UUIDv8 Value . . . . . . . . . . . . . . . . 28 Appendix C. Version and Variant Tables . . . . . . . . . . . . . 29 C.1. Variant 10xx Versions . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 Peabody & Davis Expires 25 December 2022 [Page 2] Internet-Draft new-uuid-format June 2022 1. Introduction Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions. One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics: 1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. 2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754]. 3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison. 4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed. 5. Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations. Peabody & Davis Expires 25 December 2022 [Page 3] Internet-Draft new-uuid-format June 2022 6. [RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different. Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing. 1. [ULID] by A. Feerasta 2. [LexicalUUID] by Twitter 3. [Snowflake] by Twitter 4. [Flake] by Boundary 5. [ShardingID] by Instagram 6. [KSUID] by Segment 7. [Elasticflake] by P. Pearcy 8. [FlakeID] by T. Pawlak 9. [Sonyflake] by Sony 10. [orderedUuid] by IT. Cabrera 11. [COMBGUID] by R. Tallent 12. [SID] by A. Chilton 13. [pushID] by Google 14. [XID] by O. Poitrey 15. [ObjectID] by MongoDB 16. [CUID] by E. Elliott An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues. 2. Terminology 2.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Peabody & Davis Expires 25 December 2022 [Page 4] Internet-Draft new-uuid-format June 2022 2.2. Abbreviations The following abbreviations are used in this document: UUID Universally Unique Identifier [RFC4122] CSPRNG Cryptographically Secure Pseudo-Random Number Generator MAC Media Access Control MSB Most Significant Bit DBMS Database Management System 3. Summary of Changes The following UUIDs are hereby introduced: UUID version 6 (UUIDv6) A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See Section 5.1 UUID version 7 (UUIDv7) An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See Section 5.2 UUID version 8 (UUIDv8) A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See Section 5.3 Max UUID A specialized UUID which is the inverse of [RFC4122], Section 4.1.7 See Section 5.4 3.1. changelog RFC EDITOR PLEASE DELETE THIS SECTION. draft-04 - Fixed bad title in IEEE754 Normative Reference - Fixed bad GMT offset in Test Vector Appendix - Removed MAY in Counters section - Condensed Counter Type into Counter Methods to reduce text - Removed option for random increment along with fixed-length counter Peabody & Davis Expires 25 December 2022 [Page 5] Internet-Draft new-uuid-format June 2022 - Described how to handle scenario where New UUID less than Old UUID - Allow timestamp increment if counter overflows - Replaced UUIDv8 C code snippet with full generation example - Fixed RFC4086 Reference link - Describe reseeding best practice for CSPRNG - Changed MUST to SHOULD removing requirement for absolute monotonicity draft-03 - Reworked the draft body to make the content more concise - UUIDv6 section reworked to just the reorder of the timestamp - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp - UUIDv8 relaxed to be custom in all elements except version and variant - Introduced Max UUID. - Added C code samples in Appendix. - Added test vectors in Appendix. - Version and Variant section combined into one section. - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG). - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc. - Split Encoding and Storage into Opacity and DBMS and Database Considerations - Reworked Global Uniqueness under new section Global and Local Uniqueness - Node verbiage only used in UUIDv6 all others reference random/ rand instead - Clock sequence verbiage changed simply to counter in any section other than UUIDv6 - Added Abbreviations section - Updated IETF Draft XML Layout - Added information about little-endian UUIDs draft-02 - Added Changelog - Fixed misc. grammatical errors - Fixed section numbering issue - Fixed some UUIDvX reference issues - Changed all instances of "motonic" to "monotonic" - Changed all instances of "#-bit" to "# bit" - Changed "proceeding" verbiage to "after" in section 7 Peabody & Davis Expires 25 December 2022 [Page 6] Internet-Draft new-uuid-format June 2022 - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7 - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7 - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option. - Fixed bad reference to non-existent "time_or_node" in section 4.5.4 draft-01 - Complete rewrite of entire document. - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards. - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques. - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research) - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes) 4. Variant and Version Fields The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1. The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1. Peabody & Davis Expires 25 December 2022 [Page 7] Internet-Draft new-uuid-format June 2022 +------+------+------+------+---------+---------------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+---------------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian time- | | | | | | | based UUID specified in | | | | | | | this document. | +------+------+------+------+---------+---------------------------+ | 0 | 1 | 1 | 1 | 7 | Unix Epoch time-based | | | | | | | UUID specified in this | | | | | | | document. | +------+------+------+------+---------+---------------------------+ | 1 | 0 | 0 | 0 | 8 | Reserved for custom UUID | | | | | | | formats specified in this | | | | | | | document | +------+------+------+------+---------+---------------------------+ Table 1: New UUID variant 10xx (8/9/A/B) versions defined by this specification For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant. 00000000-0000-6000-8000-000000000000 00000000-0000-6000-9000-000000000000 00000000-0000-6000-A000-000000000000 00000000-0000-6000-B000-000000000000 xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx Figure 1: UUIDv6 Variant Examples 5. New Formats The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure. 5.1. UUID Version 6 UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead. Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, Peabody & Davis Expires 25 December 2022 [Page 8] Internet-Draft new-uuid-format June 2022 for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp. The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5. The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8 The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: UUIDv6 Field and Bit Layout time_high: The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3) time_mid: The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5) time_low_and_version: The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7) clk_seq_hi_res: The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8) Peabody & Davis Expires 25 December 2022 [Page 9] Internet-Draft new-uuid-format June 2022 clock_seq_low: The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9) node: 48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15) With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. 5.2. UUID Version 7 UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6. Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_ts_ms | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_ts_ms | ver | rand_a | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: UUIDv7 Field and Bit Layout unix_ts_ms: 48 bit big-endian unsigned number of Unix epoch timestamp as per Section 6.1. ver: 4 bit UUIDv7 version set as per Section 4 Peabody & Davis Expires 25 December 2022 [Page 10] Internet-Draft new-uuid-format June 2022 rand_a: 12 bits pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6. var: The 2 bit variant defined by Section 4. rand_b: The final 62 bits of pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6. 5.3. UUID Version 8 UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed. The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. Some example situations in which UUIDv8 usage could occur: * An implementation would like to embed extra information within the UUID other than what is defined in this document. * An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_a | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_a | ver | custom_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| custom_c | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_c | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: UUIDv8 Field and Bit Layout custom_a: The first 48 bits of the layout that can be filled as an implementation sees fit. Peabody & Davis Expires 25 December 2022 [Page 11] Internet-Draft new-uuid-format June 2022 ver: The 4 bit version field as defined by Section 4 custom_b: 12 more bits of the layout that can be filled as an implementation sees fit. var: The 2 bit variant field as defined by Section 4. custom_c: The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit. 5.4. Max UUID The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7 FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF Figure 5: Max UUID Format 6. UUID Best Practices The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations. 6.1. Timestamp Granularity UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic. Peabody & Davis Expires 25 December 2022 [Page 12] Internet-Draft new-uuid-format June 2022 Reliability: Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.) Source: UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged. Sub-second Precision and Accuracy: Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized. Length: The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp. Altering, Fuzzing, or Smearing: Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time. Peabody & Davis Expires 25 December 2022 [Page 13] Internet-Draft new-uuid-format June 2022 Padding: When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7. Truncating: Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7. 6.2. Monotonicity and Counters Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section. Additionally, care SHOULD be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp. For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3 Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation. Peabody & Davis Expires 25 December 2022 [Page 14] Internet-Draft new-uuid-format June 2022 Fixed-Length Dedicated Counter Bits (Method 1): This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 SHOULD be utilized as fixed- length dedicated counter bits that are incremented by one for every UUID generation. The trailing random bits generated for each new UUID in rand_b can help produce unguessable UUIDs. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits. Monotonic Random (Method 2): With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero. It ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy. The increment value MAY be one when the amount of UUIDs generated in a particular period of time is important and guessability is not an issue. However, it SHOULD NOT be used by implementations that favor unguessiblity, as the resulting values are easily guessable. The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters: Fixed-Length Dedicated Counter Seeding: Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position Peabody & Davis Expires 25 December 2022 [Page 15] Internet-Draft new-uuid-format June 2022 randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers. Fixed-Length Dedicated Counter Length: Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch. The following sub-topics cover rollover handling with either type of counter method: Counter Rollover Guards: The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also helpful to mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding. Counter Rollover Handling: Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter. Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature: 1. Compare the current timestamp against the previously stored timestamp. 2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method. Peabody & Davis Expires 25 December 2022 [Page 16] Internet-Draft new-uuid-format June 2022 3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter). Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous. To handle this scenario, the general guidance is that application MAY reuse the previous timestamp and increment the previous counter method. 6.3. Distributed UUID Generation Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments. Centralized Registry: With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification. Node IDs: With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per Section 8. The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification. Peabody & Davis Expires 25 December 2022 [Page 17] Internet-Draft new-uuid-format June 2022 Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement. 6.4. Collision Resistance Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3. There are two example scenarios below which help illustrate the varying seriousness of a collision within an application. Low Impact A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format. High Impact: A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context. 6.5. Global and Local Uniqueness UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required. Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122]. Peabody & Davis Expires 25 December 2022 [Page 18] Internet-Draft new-uuid-format June 2022 6.6. Unguessability Implementations SHOULD utilize a cryptographically secure pseudo- random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation. CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs. Advice on generating cryptographic-quality random numbers can be found in [RFC4086] 6.7. Sorting UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection. Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large. UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation. UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8. 6.8. Opacity UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant. 6.9. DBMS and Database Considerations For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value. For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such: Peabody & Davis Expires 25 December 2022 [Page 19] Internet-Draft new-uuid-format June 2022 * Storing as binary requires less space and may result in faster data access. * Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement. DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback. 7. IANA Considerations This document has no IANA actions. 8. Security Considerations MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized. 9. Acknowledgements The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document. 10. Normative References Peabody & Davis Expires 25 December 2022 [Page 20] Internet-Draft new-uuid-format June 2022 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", RFC 4086, DOI 10.17487/RFC4086, June 2005, . 11. Informative References [LexicalUUID] Twitter, "A Scala client for Cassandra", commit f6da4e0, November 2012, . [Snowflake] Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, May 2014, . [Flake] Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, February 2017, . [ShardingID] Instagram Engineering, "Sharding & IDs at Instagram", December 2012, . [KSUID] Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, July 2020, . Peabody & Davis Expires 25 December 2022 [Page 21] Internet-Draft new-uuid-format June 2022 [Elasticflake] Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, January 2015, . [FlakeID] Pawlak, T., "Flake ID Generator", Commit fcd6a2f, April 2020, . [Sonyflake] Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, August 2020, . [orderedUuid] Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", January 2020, . [COMBGUID] Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, December 2020, . [ULID] Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, May 2019, . [SID] Chilton, A., "sid : generate sortable identifiers", Commit 660e947, June 2019, . [pushID] Google, "The 2^120 Ways to Ensure Unique Identifiers", February 2015, . [XID] Poitrey, O., "Globally Unique ID Generator", Commit efa678f, October 2020, . [ObjectID] MongoDB, "ObjectId - MongoDB Manual", . [CUID] Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, October 2020, . [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic.", Series 754-2019, July 2019, . Peabody & Davis Expires 25 December 2022 [Page 22] Internet-Draft new-uuid-format June 2022 Appendix A. Example Code A.1. Creating a UUIDv6 Value This section details a function in C which converts from a UUID version 1 to version 6: #include #include #include #include #include /* Converts UUID version 1 to version 6 in place. */ void uuidv1tov6(uuid_t u) { uint64_t ut; unsigned char *up = (unsigned char *)u; // load ut with the first 64 bits of the UUID ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32; ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4]))); // dance the bit-shift... ut = ((ut >> 32) & 0x0FFF) | // 12 least significant bits (0x6000) | // version number ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits ((ut << 20) & 0x000FFFF000000000) | // next 16 bits (ut << 52); // 12 most significant bits // store back in UUID *((uint32_t*)up) = htonl((uint32_t)(ut >> 32)); *((uint32_t*)&up[4]) = htonl((uint32_t)(ut)); } Figure 6: UUIDv6 Function in C A.2. Creating a UUIDv7 Value Peabody & Davis Expires 25 December 2022 [Page 23] Internet-Draft new-uuid-format June 2022 #include #include #include #include #include // ... // csprng data source FILE *rndf; rndf = fopen("/dev/urandom", "r"); if (rndf == 0) { printf("fopen /dev/urandom error\n"); return 1; } // ... // generate one UUIDv7E uint8_t u[16]; struct timespec ts; int ret; ret = clock_gettime(CLOCK_REALTIME, &ts); if (ret != 0) { printf("clock_gettime error: %d\n", ret); return 1; } uint64_t tms; tms = ((uint64_t)ts.tv_sec) * 1000; tms += ((uint64_t)ts.tv_nsec) / 1000000; memset(u, 0, 16); fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes *((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0 u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1 Figure 7: UUIDv7 Function in C Peabody & Davis Expires 25 December 2022 [Page 24] Internet-Draft new-uuid-format June 2022 A.3. Creating a UUIDv8 Value UUIDv8 will vary greatly from implementation to implementation. The following example utilizes: * 32 bit custom-epoch timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC) * 16 bit exotic resolution (~15 microsecond) subsecond timestamp encoded using the fractional representation * 58 bit random number * 8 bit application-specific unique node ID * 8 bit rolling sequence number Peabody & Davis Expires 25 December 2022 [Page 25] Internet-Draft new-uuid-format June 2022 #include #include int get_random_bytes(uint8_t *buffer, int count) { // ... } int generate_uuidv8(uint8_t *uuid, uint8_t node_id) { struct timespec tp; if (clock_gettime(CLOCK_REALTIME, &tp) != 0) return -1; // real-time clock error // 32 bit biased timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC) uint32_t timestamp_sec = tp.tv_sec - 1577836800; uuid[0] = timestamp_sec >> 24; uuid[1] = timestamp_sec >> 16; uuid[2] = timestamp_sec >> 8; uuid[3] = timestamp_sec; // 16 bit subsecond fraction (~15 microsecond resolution) uint16_t timestamp_subsec = ((uint64_t)tp.tv_nsec << 16) / 1000000000; uuid[4] = timestamp_subsec >> 8; uuid[5] = timestamp_subsec; // 58 bit random number and required ver and var fields if (get_random_bytes(&uuid[6], 8) != 0) return -1; // random number generator error uuid[6] = 0x80 | (uuid[6] & 0x0f); uuid[8] = 0x80 | (uuid[8] & 0x3f); // 8 bit application-specific node ID to guarantee application-wide uniqueness uuid[14] = node_id; // 8 bit rolling sequence number to help ensure process-wide uniqueness static uint8_t sequence = 0; uuid[15] = sequence++; // NOTE: unprotected from race conditions return 0; } Figure 8: UUIDv8 Function in C Appendix B. Test Vectors Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 Peabody & Davis Expires 25 December 2022 [Page 26] Internet-Draft new-uuid-format June 2022 Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data. # Unix Nanosecond precision to Gregorian 100-nanosecond intervals gregorian_100_ns = (Unix_64_bit_nanoseconds / 100) + gregorian_Unix_offset # Gregorian to Unix Offset: # The number of 100-ns intervals between the # UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00. # gregorian_Unix_offset = 0x01b21dd213814000 or 122192928000000000 # Unix 64 bit Nanosecond Timestamp: # Unix NS: Tuesday, February 22, 2022 2:22:22 PM GMT-05:00 # Unix_64_bit_nanoseconds = 0x16D6320C3D4DCC00 or 1645557742000000000 # Work: # gregorian_100_ns = (1645557742000000000 / 100) + 122192928000000000 # (138648505420000000 - 122192928000000000) * 100 = Unix_64_bit_nanoseconds # Final: # gregorian_100_ns = 0x1EC9414C232AB00 or 138648505420000000 # Original: 000111101100100101000001010011000010001100101010101100000000 # UUIDv1: 11000010001100101010101100000000|1001010000010100|0001|000111101100 # UUIDv6: 00011110110010010100000101001100|0010001100101010|0110|101100000000 Figure 9: Test Vector Timestamp Pseudo-code B.1. Example of a UUIDv6 Value ---------------------------------------------- field bits value_hex ---------------------------------------------- time_low 32 0xC232AB00 time_mid 16 0x9414 time_hi_and_version 16 0x11EC clk_seq_hi_res 8 0xB3 clock_seq_low 8 0xC8 node 48 0x9E6BDECED846 ---------------------------------------------- total 128 ---------------------------------------------- final_hex: C232AB00-9414-11EC-B3C8-9E6BDECED846 Figure 10: UUIDv1 Example Test Vector Peabody & Davis Expires 25 December 2022 [Page 27] Internet-Draft new-uuid-format June 2022 ----------------------------------------------- field bits value_hex ----------------------------------------------- time_high 32 0x1EC9414C time_mid 16 0x232A time_low_and_version 16 0x6B00 clk_seq_hi_res 8 0xB3 clock_seq_low 8 0xC8 node 48 0x9E6BDECED846 ----------------------------------------------- total 128 ----------------------------------------------- final_hex: 1EC9414C-232A-6B00-B3C8-9E6BDECED846 Figure 11: UUIDv6 Example Test Vector B.2. Example of a UUIDv7 Value This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits rand_a and rand_b are filled with random data. The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F22E279B0 or 1645557742000 ------------------------------- field bits value ------------------------------- unix_ts_ms 48 0x17F22E279B0 var 4 0x7 rand_a 12 0xCC3 var 2 b10 rand_b 62 0x18C4DC0C0C07398F ------------------------------- total 128 ------------------------------- final: 017F22E2-79B0-7CC3-98C4-DC0C0C07398F Figure 12: UUIDv7 Example Test Vector B.3. Example of a UUIDv8 Value This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version. Peabody & Davis Expires 25 December 2022 [Page 28] Internet-Draft new-uuid-format June 2022 The next two segments of custom_b and custom_c are are filled with random data. Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000 It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example. ------------------------------- field bits value ------------------------------- custom_a 48 0x320C3D4DCC00 ver 4 0x8 custom_b 12 0x75B var 2 b10 custom_c 62 0xEC932D5F69181C0 ------------------------------- total 128 ------------------------------- final: 320C3D4D-CC00-875B-8EC9-32D5F69181C0 Figure 13: UUIDv8 Example Test Vector Appendix C. Version and Variant Tables C.1. Variant 10xx Versions +------+------+------+------+---------+----------------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 0 | 0 | 0 | Unused | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 0 | 1 | 1 | The Gregorian time-based | | | | | | | UUID from in [RFC4122], | | | | | | | Section 4.1.3 | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 1 | 0 | 2 | DCE Security version, with | | | | | | | embedded POSIX UIDs from | | | | | | | [RFC4122], Section 4.1.3 | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 1 | 1 | 3 | The name-based version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3 that uses | | | | | | | MD5 hashing. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 0 | 0 | 4 | The randomly or pseudo- | Peabody & Davis Expires 25 December 2022 [Page 29] Internet-Draft new-uuid-format June 2022 | | | | | | randomly generated version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 0 | 1 | 5 | The name-based version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3 that uses | | | | | | | SHA-1 hashing. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian time- | | | | | | | based UUID specified in | | | | | | | this document. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 1 | 1 | 7 | Unix Epoch time-based UUID | | | | | | | specified in this | | | | | | | document. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 0 | 0 | 8 | Reserved for custom UUID | | | | | | | formats specified in this | | | | | | | document. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 0 | 1 | 9 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 1 | 0 | 10 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 1 | 1 | 11 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 0 | 0 | 12 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 0 | 1 | 13 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 1 | 0 | 14 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 1 | 1 | 15 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ Table 2: All UUID variant 10xx (8/9/A/B) version definitions. Authors' Addresses Brad G. Peabody Peabody & Davis Expires 25 December 2022 [Page 30] Internet-Draft new-uuid-format June 2022 Email: brad@peabody.io Kyzer R. Davis Email: kydavis@cisco.com Peabody & Davis Expires 25 December 2022 [Page 31] ================================================ FILE: draft-peabody-dispatch-new-uuid-format-04.xml ================================================ New UUID Formats
brad@peabody.io
kydavis@cisco.com
ART dispatch uuid This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.
Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions. One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:
  1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.
  2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as .
  3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.
  4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.
  5. Many of the implementation details specified in involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.
  6. does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.
Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.
  1. by A. Feerasta
  2. by Twitter
  3. by Twitter
  4. by Boundary
  5. by Instagram
  6. by Segment
  7. by P. Pearcy
  8. by T. Pawlak
  9. by Sony
  10. by IT. Cabrera
  11. by R. Tallent
  12. by A. Chilton
  13. by Google
  14. by O. Poitrey
  15. by MongoDB
  16. by E. Elliott
An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.
The following abbreviations are used in this document:
UUID
Universally Unique Identifier
CSPRNG
Cryptographically Secure Pseudo-Random Number Generator
MAC
Media Access Control
MSB
Most Significant Bit
DBMS
Database Management System
The following UUIDs are hereby introduced:
UUID version 6 (UUIDv6)
A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See
UUID version 7 (UUIDv7)
An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See
UUID version 8 (UUIDv8)
A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See
Max UUID
A specialized UUID which is the inverse of See
RFC EDITOR PLEASE DELETE THIS SECTION. draft-04
  • - Fixed bad title in IEEE754 Normative Reference
  • - Fixed bad GMT offset in Test Vector Appendix
  • - Removed MAY in Counters section
  • - Condensed Counter Type into Counter Methods to reduce text
  • - Removed option for random increment along with fixed-length counter
  • - Described how to handle scenario where New UUID less than Old UUID
  • - Allow timestamp increment if counter overflows
  • - Replaced UUIDv8 C code snippet with full generation example
  • - Fixed RFC4086 Reference link
  • - Describe reseeding best practice for CSPRNG
  • - Changed MUST to SHOULD removing requirement for absolute monotonicity
draft-03
  • - Reworked the draft body to make the content more concise
  • - UUIDv6 section reworked to just the reorder of the timestamp
  • - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp
  • - UUIDv8 relaxed to be custom in all elements except version and variant
  • - Introduced Max UUID.
  • - Added C code samples in Appendix.
  • - Added test vectors in Appendix.
  • - Version and Variant section combined into one section.
  • - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).
  • - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.
  • - Split Encoding and Storage into Opacity and DBMS and Database Considerations
  • - Reworked Global Uniqueness under new section Global and Local Uniqueness
  • - Node verbiage only used in UUIDv6 all others reference random/rand instead
  • - Clock sequence verbiage changed simply to counter in any section other than UUIDv6
  • - Added Abbreviations section
  • - Updated IETF Draft XML Layout
  • - Added information about little-endian UUIDs
draft-02
  • - Added Changelog
  • - Fixed misc. grammatical errors
  • - Fixed section numbering issue
  • - Fixed some UUIDvX reference issues
  • - Changed all instances of "motonic" to "monotonic"
  • - Changed all instances of "#-bit" to "# bit"
  • - Changed "proceeding" verbiage to "after" in section 7
  • - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7
  • - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7
  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.
  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4
draft-01
  • - Complete rewrite of entire document.
  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.
  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.
  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)
  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)
The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by . The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in . New UUID variant 10xx (8/9/A/B) versions defined by this specification
Msb0Msb1Msb2Msb3VersionDescription
01106Reordered Gregorian time-based UUID specified in this document.
01117Unix Epoch time-based UUID specified in this document.
10008Reserved for custom UUID formats specified in this document
For UUID version 6, 7 and 8 the variant field placement from are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.
UUIDv6 Variant Examples
The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.
UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead. Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in , for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp. The clock sequence bits remain unchanged from their usage and position in . The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from and . For more information on MAC address usage within UUIDs see the The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1
UUIDv6 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)
With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.
UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6. Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.
UUIDv7 Field and Bit Layout
unix_ts_ms:
48 bit big-endian unsigned number of Unix epoch timestamp as per .
ver:
4 bit UUIDv7 version set as per
rand_a:
12 bits pseudo-random data to provide uniqueness as per and .
var:
The 2 bit variant defined by .
rand_b:
The final 62 bits of pseudo-random data to provide uniqueness as per and .
UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in . UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed. The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. Some example situations in which UUIDv8 usage could occur:
  • An implementation would like to embed extra information within the UUID other than what is defined in this document.
  • An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.
UUIDv8 Field and Bit Layout
custom_a:
The first 48 bits of the layout that can be filled as an implementation sees fit.
ver:
The 4 bit version field as defined by
custom_b:
12 more bits of the layout that can be filled as an implementation sees fit.
var:
The 2 bit variant field as defined by .
custom_c:
The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit.
The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in
Max UUID Format
The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.
UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.
Reliability:
Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)
Source:
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged.
Sub-second Precision and Accuracy:
Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized.
Length:
The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp.
Altering, Fuzzing, or Smearing:
Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time.
Padding:
When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7.
Truncating:
Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7.
Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section. Additionally, care SHOULD be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp. For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.
Fixed-Length Dedicated Counter Bits (Method 1):
This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 SHOULD be utilized as fixed-length dedicated counter bits that are incremented by one for every UUID generation. The trailing random bits generated for each new UUID in rand_b can help produce unguessable UUIDs. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits.
Monotonic Random (Method 2):
With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero. It ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy. The increment value MAY be one when the amount of UUIDs generated in a particular period of time is important and guessability is not an issue. However, it SHOULD NOT be used by implementations that favor unguessiblity, as the resulting values are easily guessable.
The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:
Fixed-Length Dedicated Counter Seeding:
Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers.
Fixed-Length Dedicated Counter Length:
Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.
The following sub-topics cover rollover handling with either type of counter method:
Counter Rollover Guards:
The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also helpful to mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding.
Counter Rollover Handling:
Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.
Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:
  1. Compare the current timestamp against the previously stored timestamp.
  2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method.
  3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).
Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous. To handle this scenario, the general guidance is that application MAY reuse the previous timestamp and increment the previous counter method.
Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments.
Centralized Registry:
With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification.
Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per . The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification.
Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement.
Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as and . This is especially true for distributed node collision resistance as defined by . There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.
Low Impact
A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format.
High Impact:
A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context.
UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required. Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in as they see fit to extend the uniqueness guaranteed this specification and .
Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation. CSPRNG ensures the best of and are present in modern UUIDs. Advice on generating cryptographic-quality random numbers can be found in
UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection. Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large. UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation. UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.
UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to for more information on determining UUID version and variant.
For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value. For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:
  • Storing as binary requires less space and may result in faster data access.
  • Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.
DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.
This document has no IANA actions.
MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See for more information. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then UUIDv4 SHOULD be utilized.
The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.
Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings. A Universally Unique IDentifier (UUID) URN Namespace This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation\'s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms. This specification is derived from the DCE specification with the kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been incorporated into this document. [STANDARDS-TRACK] Randomness Requirements for Security Security systems are built on strong cryptographic algorithms that foil pattern analysis attempts. However, the security of these systems is dependent on generating secret quantities for passwords, cryptographic keys, and similar quantities. The use of pseudo-random processes to generate secret quantities can result in pseudo-security. A sophisticated attacker may find it easier to reproduce the environment that produced the secret quantities and to search the resulting small set of possibilities than to locate the quantities in the whole of the potential number space. Choosing random quantities to foil a resourceful and motivated adversary is surprisingly difficult. This document points out many pitfalls in using poor entropy sources or traditional pseudo-random number generation techniques for generating such quantities. It recommends the use of truly random hardware techniques and shows that the existing hardware on many systems can be used for this purpose. It provides suggestions to ameliorate the problem when a hardware solution is not available, and it gives examples of how large such quantities need to be for some applications. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. A Scala client for Cassandra Twitter Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees. Twitter Flake: A decentralized, k-ordered id generation service in Erlang Boundary Sharding & IDs at Instagram Instagram Engineering K-Sortable Globally Unique IDs Segment Sequential UUID / Flake ID generator pulled out of elasticsearch common Flake ID Generator A distributed unique ID generator inspired by Twitter's Snowflake Sony Laravel: The mysterious "Ordered UUID" Creating sequential GUIDs in C# for MSSQL or PostgreSql Universally Unique Lexicographically Sortable Identifier sid : generate sortable identifiers The 2^120 Ways to Ensure Unique Identifiers Google Globally Unique ID Generator ObjectId - MongoDB Manual MongoDB Collision-resistant ids optimized for horizontal scaling and performance. IEEE Standard for Floating-Point Arithmetic. IEEE
This section details a function in C which converts from a UUID version 1 to version 6:
UUIDv6 Function in C #include #include #include #include /* Converts UUID version 1 to version 6 in place. */ void uuidv1tov6(uuid_t u) { uint64_t ut; unsigned char *up = (unsigned char *)u; // load ut with the first 64 bits of the UUID ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32; ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4]))); // dance the bit-shift... ut = ((ut >> 32) & 0x0FFF) | // 12 least significant bits (0x6000) | // version number ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits ((ut << 20) & 0x000FFFF000000000) | // next 16 bits (ut << 52); // 12 most significant bits // store back in UUID *((uint32_t*)up) = htonl((uint32_t)(ut >> 32)); *((uint32_t*)&up[4]) = htonl((uint32_t)(ut)); } ]]>
UUIDv7 Function in C #include #include #include #include // ... // csprng data source FILE *rndf; rndf = fopen("/dev/urandom", "r"); if (rndf == 0) { printf("fopen /dev/urandom error\n"); return 1; } // ... // generate one UUIDv7E uint8_t u[16]; struct timespec ts; int ret; ret = clock_gettime(CLOCK_REALTIME, &ts); if (ret != 0) { printf("clock_gettime error: %d\n", ret); return 1; } uint64_t tms; tms = ((uint64_t)ts.tv_sec) * 1000; tms += ((uint64_t)ts.tv_nsec) / 1000000; memset(u, 0, 16); fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes *((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0 u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1 ]]>
UUIDv8 will vary greatly from implementation to implementation. The following example utilizes:
  • 32 bit custom-epoch timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC)
  • 16 bit exotic resolution (~15 microsecond) subsecond timestamp encoded using the fractional representation
  • 58 bit random number
  • 8 bit application-specific unique node ID
  • 8 bit rolling sequence number
UUIDv8 Function in C #include int get_random_bytes(uint8_t *buffer, int count) { // ... } int generate_uuidv8(uint8_t *uuid, uint8_t node_id) { struct timespec tp; if (clock_gettime(CLOCK_REALTIME, &tp) != 0) return -1; // real-time clock error // 32 bit biased timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC) uint32_t timestamp_sec = tp.tv_sec - 1577836800; uuid[0] = timestamp_sec >> 24; uuid[1] = timestamp_sec >> 16; uuid[2] = timestamp_sec >> 8; uuid[3] = timestamp_sec; // 16 bit subsecond fraction (~15 microsecond resolution) uint16_t timestamp_subsec = ((uint64_t)tp.tv_nsec << 16) / 1000000000; uuid[4] = timestamp_subsec >> 8; uuid[5] = timestamp_subsec; // 58 bit random number and required ver and var fields if (get_random_bytes(&uuid[6], 8) != 0) return -1; // random number generator error uuid[6] = 0x80 | (uuid[6] & 0x0f); uuid[8] = 0x80 | (uuid[8] & 0x3f); // 8 bit application-specific node ID to guarantee application-wide uniqueness uuid[14] = node_id; // 8 bit rolling sequence number to help ensure process-wide uniqueness static uint8_t sequence = 0; uuid[15] = sequence++; // NOTE: unprotected from race conditions return 0; } ]]>
Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.
Test Vector Timestamp Pseudo-code
UUIDv1 Example Test Vector
UUIDv6 Example Test Vector
This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits rand_a and rand_b are filled with random data. The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F22E279B0 or 1645557742000
UUIDv7 Example Test Vector
This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version. The next two segments of custom_b and custom_c are are filled with random data. Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000 It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.
UUIDv8 Example Test Vector
All UUID variant 10xx (8/9/A/B) version definitions.
Msb0Msb1Msb2Msb3VersionDescription
00000Unused
00011The Gregorian time-based UUID from in
00102DCE Security version, with embedded POSIX UIDs from
00113The name-based version specified in that uses MD5 hashing.
01004The randomly or pseudo-randomly generated version specified in .
01015The name-based version specified in that uses SHA-1 hashing.
01106Reordered Gregorian time-based UUID specified in this document.
01117Unix Epoch time-based UUID specified in this document.
10008Reserved for custom UUID formats specified in this document.
10019Reserved for future definition.
101010Reserved for future definition.
101111Reserved for future definition.
110012Reserved for future definition.
110113Reserved for future definition.
111014Reserved for future definition.
111115Reserved for future definition.
================================================ FILE: editor-notes/LATEST-OUTLINE.md ================================================ # Outline Here's what I'm thinking for another outline. Instead of going down more or less in sequence of version and field by field, instead give right at the top (after a brief sumary) the core pieces of info required to produce a correct implementation, including the bit layout. Then cover the text formats. And the following that, go through each of the various application-specific concerns - keeping it succinct and so each section you can easily read the first line or two and decide if it matters for your application and skip if not. Don't repeat RFC4122 stuff unless it's vital to cover in this doc. ## Background (more or less what is in the current -02 draft, edit as needed) ## Motivation - The need for unique IDs in applications is great - Time-ordered values are useful as database keys and important for performance (more in section ...) - By having values that are time-ordered when sorted as raw bytes, it allows them to have a useful sort order while still being opaque. - It really shouldn't be over-specified. Application requirements are vastly different. Generated values should generally just be opaque. - But, it is helpful to have a document which explains the tradeoffs and gives guidance. (Picking a good unique value generation scheme is non-trivial.) - And provides for backward compatability with RFC4122. ## Summary of Changes - Three new UUIDs with different tradeoffs: - v6 is easy to adapt from v1 - v7 is easy to implement for new time-ordered UUID implementations (particularly which don't care about implementing prior UUID versions) - v8 allows you to do whatever you want - Crockford base32 text format - Variable length is allowed (optional, 128-bits remains the default) - Requirements for storing vs generating UUIDs are separated out for clarity and simplicity (and because they are different) - Other related implementation concerns are each covered separately. ## Storing UUIDs - UUIDs SHOULD be treated as opaque values unless there is a good reason to do otherwise (i.e. never examine the bits in a UUID unless you really have no other choice) - As such, any storage mechanism capable of storing a series of bytes (minimum 9, maximum 64) is a valid storage mechanism for UUIDs. If an implementation chooses to only support 128-bit UUIDs, then anything that can store 16 bytes is valid. THERE IS NO REQUIREMENT ABOUT IMPLEMENTATIONS STORING UUIDS TO HAVE AN UNDERSTANDING OF THEIR CONTENTS. - A UUID with all zero bytes is never valid (per this doc and the allowed values per RFC4122), so instead of checking "is this UUID valid", compare against all zeros. THE LESS UUID INSPECTION IS DONE, THE BETTER (resulting code is simpler, more obvious, less likely to be broken and probably faster). - If you really must inspect a UUID, the procedure for determining the version is (see RFC4122 for normative reference on UUIDs version <= 5): - Read the two most significant bits from the octect 8 (zero-based) and if it's 0x10 the check the most significant 4 bits of octect 6 for the version number (per RFC4122) - Otherwise, per this new spec, check octect 8 for the value 0xE7 meaning version 7 or 0xE8 meaning version 8 (we can mention that we're using variant = 0b111 here and refer to RFC4122 section 4.1.1 for the background). (note that this is the only step needed if you only care about versions 7 or 8) - Once you know the version refer to the spec of that particular version about the contents. ## Length - Default is stil 128-bits, 16 bytes. Implementations SHOULD default to this. - Implementations MAY choose to allow generation UUIDs of anywhere from 9 to 64 bytes. (decide if we want to elaborate on the various possible motivations for this here, e.g. shorter for better human use, longer for lower collision probability) - Implementations that store UUIDs of variable length SHOULD support any length from 9 to 64 bytes (decide if this is the right range - <9 bytes and we lose the variant/version, and 64 is arbitrary but we should have some upper limit in case it has an impact on physical storage requirements) - Variable length is optional and implementations MAY opt-out and only support 128-bits. - Recommendation is for implementations to start defining UUID as an array of bytes (length + pointer or whatever language mechanism) instead of a fixed set of 16 bytes. - I think we need a table in here that gives bit/byte lengths and collision probability. So someone can look down the numbers and pick the one that is right for their app. ## Text Format - Existing hex format is still valid. Shorter or longer values just omit or add hex characters (give some examples at different lengths) - Crockford base32 is allowed also (briefly summarize reasons - shorter, more compact if UUIDs are stored as text, crockford version's benefits vs). - Parsers can use the presence of '-' to determine if it's hex or crockford base32 format. - Compatible parsers MUST support crockford base32 with or without padding, and allow a checksum part (verifyin the checksum is optional) - Base32 text encoder SHOULD output crockford base32 values without padding or checksum by default. The padding or checksum features MAY be used if warranted for a specific application. ## UUIDv6 - Recommended for implementations that already implement UUIDv1 and want a low-impact way to implement time-ordered values. - Refer to RFC4122 for v1, timestamp bits reordered (describe in more detail) - Recommend "Node" field be filled with random bytes instead of MAC address for the same reasons stated in Shared Knoweldge section, but this is optional. - Be sure to remove any new language related to the sequence counter, we want to keep that as it was https://github.com/uuid6/uuid6-ietf-draft/issues/38 (from the existing draft, might tweak it a bit) ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` FIXME: I think this would be easier to read if we split out the bit fields, e.g. "var" as it's own few bits, "ver" as 4 bits, etc. - Anything else is the same as RFC4122. (Don't bother describing the old fields in detail - the old RFC does that) ## UUIDv7 FIXME: decide if we want to change this to be 6 bytes for millisecond unix timestamp, and thus be basically ULID with a variant+version field. - Recommended for NEW implementations that are not concerned with existing UUIDv1 code. - Bit layout: ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_nano_timestamp_u64 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_nano_timestamp_u64 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0xE7 | seq_rand_etc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | seq_rand_etc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` - `unix_nano_timestamp_u64` is big endian uint64 of nanoseconds since the Unix epoch (same as "unix timestamp" [link?] but with nano-second granularity). Don't have/want nanosecond-precision - see below - Mention max datetime - Implementations SHOULD use the current timestamp to provide values that are time-ordered and continually increasing. - It's okay to "fuzz" timestamp values here for various reasons (security, clock inaccuracy, etc.), there is no absolute guarantee about how close the clock value needs to be to actual time - that's implementation-specific. ## UUIDv8 - Recommended when you want an implementation that does something different than laid out here. - It means that other than octet 8 being 0xE8, there are no guarantees about the contents. - Recommend using in place of UUIDv4 to achieve similar result in a simpler way. ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0xE8 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` ## Sorting - Implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes (UUIDv6 and 7 are designed for this), without examining the contents at all. - Implementations MAY implement more complex sorting rules for UUID versions < 6 (or strictly speaking they MAY do this for any version, but shouldn't need to). - One of the big benefits of time ordering is "index locality" - new values are near each other in the index and can much more easily be clustered together for better performance. Real-world differences vs random data can be quite large. ## Timestamp Granularity - Implementations SHOULD use the current timestamp to provide values that are time-ordered and continually increasing. - It's okay to "fuzz" timestamp values here for various reasons (security, clock inaccuracy, etc.), there is no absolute guarantee about how close the clock value needs to be to actual time - that's implementation-specific. - Give some examples ## Monotonicity - This is really just for v7: - defined simply as "each value returned is greater than the last" - Implementations SHOULD return monotonic values where it is feasible. E.g. in a single library, effort should be made to return successive values that count up. - You can do this with an emedded sequence counter, by waiting for the next clock tick, or by checking that in case of next value has same timestamp see that the random bytes part is greater than prior value and generate new random bytes and check again. Describe trade offs of these approaches, but it's implemenatation-specific - no mandedated sequence counter. - Provide a clear suggestion of a recommended way (e.g. something as simple as: "if next UUID is <= last, generate again until timestamp or random bytes provide a value that is higher" - maybe consider a max limit for cases where the system clock rolls back). - The monotonicity properties of an particular UUID generator SHOULD stated in it's documentation. ## Global vs Local Uniqueness - Global uniquess is impossible to guarantee without shared knowledge. I.e. two systems cannot come up with two numbers completely independently without some possibility of collission UNLESS they agree on some mechanism to ensure uniqueness ahead of time (e.g. your numbers always end with ... and mine always end with... or whatever) - RFC4122 tried to use MAC address as shared knowledge - it sort of worked but had problems (security issues with exposing MAC addresses, guarantee of actual global uniqueness is questionable) - So instead just decide if you need 100% guarantee of uniqueness. If so, implement shared knowledge approach (see section below) - If not, reduce probability of uniqueness to acceptable level for your application. - It is okay if implementations only provide "local" uniqueness, e.g. unique within one database instance or cluster of machines - as long as the implementation A) states that and B) the value can be reasonably expected to remain only in that system. I.e. don't use this unique-within-this-database UUID across databases and the docs need to state this. - Implementations which do not know the uniqueness requirements of the final application and cannot implement shared knowledge should just be made as unique as possible and state that. ## Collision Resistance - In the absence of shared knowledge, collisions cannot be fully prevented, only the probability reduced. - Give the math and some examples. - Variable length allows you to further reduce collision probably beyond 128-bits if needed. If we give the math, then people can make their own decisions based on their app. - v7 is time ordered for better index locality/database performance, v8 has lower collision resistance, pick your poison. ## Unguessability - Some applications acquire security benefits from generating values that cannot be predicted (this is related to collision resistance, but not the same thing). - Using a cryptographically secure pseudo-random number generator (CSPRNG) provides values that are both difficult to predict ("unguessable") and have a low likelyhood of collision ("unique"). And so this is the recommended approach. ## Shared Knowledge - Means a pre-arranged agreement about how different systems or pieces of code will each produce different values. - Examples: A section of the "random" part gets devoted to: database node number, MAC or IP address, manually entered ID, etc. Or it could be something like the UUID generator in a database implementation simply checks to ensure the UUID is not in use before generating. Regardless, the concept is that an implementation MAY just make up a rule that ensures uniqueness, at the cost of some guessability. - Using a shared knowledge pattern with the same length of UUID increases guessability (the more bits that fit a known value or pattern, the easier a value is to guess). - Shared knowledge solutions are okay and MAY be done as long as this is stated in the UUID implementation docs. - Talk a bit about the cost of collision, e.g. https://github.com/uuid6/uuid6-ietf-draft/issues/36#issuecomment-903295070 - Mention that the reason this spec does not endorse any specific global registry is because if something goes wrong with it (like the fact that MAC addresses used to be more or less unique but with cloud computing and software network interfaces being commonplace that assumption changed) - in this case random data results in lower collision probability. So basically we're saying: Global registries, aside from being inconvenient, can still have problems and thus the collision probability jumps way up above the random data approach - so let's not even bother. If an application wants a "perfect, guaranteed unique" solution, it provide it within it's own application via shared knowledge. ## Documentation - We say in various places that certain things should be stated in the implementation docs. So we should probably have a list here... If you make a UUID implementation, provide a clear statement in the documentation about each of these points: - Timestamp granularity (v7 only) - Monotoncity (v7 only do the values always count up) - Uniqueness scope (are these values supposed to be globally unique or unique with a specific context, if so which context) - Shared knowledge system (if any) - Collision resistance math (use the table in this spec if it helps) - Unguessability (how strong is the random number generator for how many bits) ## Minimal Practical Implementations (Generation) (We don't care about storing because it should be opaque, and parsing is trivial and standardized) ### UUIDv6 - Generate a UUIDv1 - Reorder the time bits - Change the version ### UUIDv7 ``` bytes[0:8] = big_endian(unix_timestamp_nano()) bytes[9] = 0xE7 bytes[10:end] = crypto_rand() // TODO: should we put examples of monotonicity logic (or other things above) in here? ``` ### UUIDv8 ``` bytes[0:end] = crypto_rand() bytes[9] = 0xE8 ``` TODO: decide if we should add in a more complete set of psuedocode with options. E.g. monotonicity, what to do when you run out of sequence numbers, a "shared knowledge" portion - while these are all implementation-specific points, it could be really helpful to have a pseudocode recommendation that gives a suggested set of options and generation logic. People trying to implement would probably really appreciate this and it would encourage implementations to act similarly while allowing whatever necessary variation. TODO: decide how this explanation fits in: https://github.com/uuid6/uuid6-ietf-draft/issues/41#issuecomment-907517063 ## Implementation Scenarios Give examples of the bit layout and/or algorithm used to address various situations that have come up. Briefly cover the motivation for each case as the first thing. - Only millisecond clock precision available (JS) - algo to fill in the rest with random (motivation: no choice, environment limitation OR precise clock readings present security risk) - Monotonic guarantee: Retry method - algorithm for generator to retry when it gets a value with the same clock tick (motivation: ease of implementation, (describe why monontoicity is desirable, i forget but there are cases)) - Monotonic guarantee: Sequence counter method - assign 12 or maybe 16 bits of the random area as a sequence counter. (motivation: you have a case where monotonicity is vital (e.g. an algorithm that depends on looking at a prior value and comparing greater than to determine if something is new) and you want to be able to generate up to X values per clock tick without waiting) - 160-bit values - example of bit layout with more random bytes at the end, generation is trivial (motivation: reduce collision probability) - Shared-knowledge approach to guarantee uniquness - embedded database node number. TODO: more examples? ================================================ FILE: editor-notes/LATEST.md ================================================ # Latest See [LATEST-OUTLINE.md](LATEST-OUTLINE.md) for corresponding outline. ## Overall - Goals: - Introduce new UUIDs which make good database or other keys. - Sorts by time (and ideally monotonically - "always counts up" - when generated from one environment) as a set of raw bytes (this is part of being able to treat UUIDs as opaque values in as many situations as possible and reducing implementation complexity). - Reduce complexity in the spec and implementations - Remove MUSTs where they are not needed - there is a lot of implementation detail in RFC4122 which does not need to be forced on every implementation and then leaving people wondering if their implementation that is close but they cheated on one thing is correct or not - Make UUIDs as opaque as possible (but 100% opaque is not practical or desirable, more below) - Focus on defining the minimum requirments for a compatible implementation. - I.e. "any implementation which produces values with the following properties is correct" - And the include the implementation recommendations as clearly separate. - Things like sequence field for monotonicity or random number source requirements are implementation-specific. People SHOULD use a CSPRNG to generate random values, but it doesn't necessarily mean the implementation is broken if it does not. - Clarify global vs local uniqueness. This gets tricky because this idea of guaranteeing a universally fully unique identifier is factually impossible - the spec just tries to do the best it can. But then when we to get implementations, you might have a database system that doesn't care about globally universally unique identifiers that are unique everywhere in the universe - they just need a database key that isn't in use. The spec should clearly indicate what is allow/not allowed for valid implementations. Part of it might be that if an implementation provides a more narrow definition "unique" for it's specific situation, then it should state this clearly but otherwise it's okay as long as the value can reasonably be expected to live in the intended environment (i.e. the database example given is fine as long as it's clear the implementation is just for generating datebase pks) - UUIDs should be as opaque as possible, but we necessarily have exceptions. - Reading the version is an example. - I think we should make an exception for the timestamp, because it is useful for practical purposes to know when a UUID was generated. - If you want to obfuscate the timestamp because you don't want this property, just use UUIDv8 where you can do whatever you want. - Things like the sequence counter and random bytes should be considered opaque, I can't think of any reason an external application would read the sequence counter from a UUID generated somewhere else and do anything useful with it. It's only used internally by the implementation, and even then implementations should be compared against a strict criteria of what is needed to produce a valid UUID, not fiddly rules about sequence counters that may or may not be useful/applicable to every scenario. - For v7 and v8 merge variant and version fields into one single 8-bit field. (new variant value = 0b111 means version field immediately follows in the next 5 bits). Reason: simplicity. I don't see any practical reason to keep the version field where it is other than fear of breaking backward compatability with BROKEN implementations. Correct RFC4122 implementations should be checking the var field and if it doesn't match an expecting value not interpreting anything else. (See RFC4122 section 4.1.1) - Mention backward compatability: Specifically what this means is: - Will new UUIDs break existing implementations and if so how often/bad. (In general I think the changes described here are acceptable because they will not break correct implementations - I realize the real world is more complicated though.) - Can we add new things in a way that breaks as little as possible (like allowing variable length but still having 128-bits as the default). - For v7 and v8 we can add the capabiliy of variable length. For existing implementations that expect 128-bits, it should be valid to zero-pad shorter values, and longer values obviously will break. 128-bits should still probably be the recommended default unless there is a VERY compelling reason to do otherwise. Reason: Additional (or less) random bytes allowed, based on application's requirements for uniqueness. - Minimum is 9 bytes (up to the variant+version byte) - Should we specify a maximum, e.g 64 or 128 or 256 bytes? It would almost certainly be useful for implementations to know what the upper limit is. - Implementations are not forced to support non-128-bit values. It is perfectly valid for an implementation to say that it implements the spec but choses not to implement variable length for whatever reason (legacy code, performance/size considerations, etc.) - Introduce Crockford base32 as a standard text encoding (variable length). Existing hex encoding stays as-is, although a definition of what to do with variable length UUIDs encoded as hex is needed.). This has the added benefit of being able to use the "-" character to positively identify which text format is in used (hex always has them, crockford base32 never does) Added 20 Aug: - We should clarify what "compatible" means and simplify this way down. **It's important to realize that UUID implementations can store UUIDs without understanding them, and that's okay (think database primary key fields, or a `UUID` type in a library).** During generation there are all kinds of concerns, and a lot of what people bring up is during this step - sequence counters, collision probably, unguessability, global vs local uniqueness, etc. However, once a UUID is generated and returned from whatever "make me a UUID" function, the requirements are very basic and simple. "Backward compatability" with an existing implementation I don't think has anything to do at all with all of the existing code that fiddles with sequence counters, etc. The thing that matters is once this is done and it gets stored somewhere in a database, in a file, etc. - if we introduce a new spec and people are generating these new UUID values, what will actually break? UUIDs are largely opaque - this makes "backward compatability" fairly simple. The main issues I see that could be sources of pain are: - New text format would break existing string parse methods - New var+ver field technically shouldn't but may break implementations which are checking the RFC4122 version field without checking the variant as well. - UUID code that expects 128-bit values obviously won't work as-is with other lengths (however, variable length is optional, and I think it could be incrementally added by implementations that want it). - The rest of the various fields I don't think have any practical impact on existing implementations and we should just do whatever is a good idea now and not worry about what someone wrote in RFC4122 16 years ago. - It's also worth noting that UUID implementations break into two parts: generating values, and storing them. For the storing part, anything that can hold a chunk of bytes is technically a correct implementation - there sholud be no obligation to introspect the data at all. In fact, the only practical reasons I've run into personally to look at what's in a UUID (other than debugging a UUID library) are 1. to determine the version so I know what strange sorting code I should write, or 2. to extract the timestamp because it was no otherwise available. Sorting is addressed in UUIDv6 and 7 by making an opaque sort as raw bytes be time-ordered, and the extracting the timestamp is essentially just an optional debugging facility - not required and not guaranteed to be correct except as stated by a particular implementation. ## UUIDv6 - Is just the re-arranged byte sequence as originally described at http://gh.peabody.io/uuidv6/ - But recommendation is to not use MAC address but CSPRNG bytes instead. - The point is to be easy to adapt from an existing UUIDv1 implementation. - Anything more complicated or different goes elsewhere. ## UUIDv7 New layout: ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_nano_timestamp_u64 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_nano_timestamp_u64 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | var | ver | seq_rand_etc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | seq_rand_etc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` - Goal: Allow nanosecond-precise timestamps with a familiar unix epoch and in a simpler format than UUIDv6 and earlier UUIDs. - Timestamp is nanoseconds since the Unix epoch as an unsigned int64 (big endian) - max date is 2554-07-21T23:34:33.999999999Z - Uses new combined var_ver byte for simplicity. - Doesn't have strict requirements for sequence counter and "node", etc. but will provide implementation recommendations. - Implementations like JS that don't have (or others that don't want) more than ms (or second or whatever lower) time precision can just fill in the least significant bytes with random data. If this breaks monotonicity (some numbers generated will be out of sequence for the same millisecond unless you guarantee the random data you insert always counts up within a given timestamp), then that's up to the implementation. Monotonicity is a SHOULD, not a MUST, with implementation recommendations. - All the way down to the var-ver byte can be truncated or it can be made longer (9 bytes would be the minimum otherwise the var-ver field disappears). Since the exact contents of everything after the 9th byte are implementation-specific, I think this would work. ## UUIDv8 Layout: ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | var | ver | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` - Goal: Make a valid version where people can do whatever they want and it's still a valid implementation. It basically means "if you see a UUID wiht this version, there is no guarantee of what's in it". ## FAQ * Do we need three new formats? Yes. They each have different tradeoffs. 6 is easy to implement given a v1 implementaitons. 7 is more time precision but simpler and provides variable length. 8 lets you do whatever you want. * Do we need to introduce a new variant? Need... hard to say, but the point is to make the spec simpler by tucking the variant and version fields into a single byte. Compare the UUIDv7 layout above to RFC4122 - is it worth it? I think so, but it's also subjective. Also, by doing this we introduce another bit which is technically unused (the most significant bit of the version field will always be 0 for now) - so we have some flexibility for the future if later it is decided this specification messed everything up and should have never been born. * Why do we need/want to introduce variable length? The basic problem is there is no one-size-fits-all level of uniqueness and collision resistance. There will always be some applications that want more bytes to increase the level of uniqueness/unguessability/collision-resistance. So if we introduce the concept here that UUIDs can be longer and implementations adapt to that idea - while still being compatible with existing 128-bit implementations where possible (i.e. I can still make a 128-bit ID with v6, 7 or 8 and chuck it in Cassandra and it's UUID code will at least not break - I've actually tested this on Cassandra for v6 but not for this new v7 or v8 proposal). ================================================ FILE: index.html ================================================ New UUID Formats
Internet-Draft new-uuid-format June 2022
Peabody & Davis Expires 25 December 2022 [Page]
Workgroup:
dispatch
Internet-Draft:
draft-peabody-dispatch-new-uuid-format-04
Updates:
4122 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
BGP. Peabody
K. Davis

New UUID Formats

Abstract

This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 25 December 2022.

1. Introduction

Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.

One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:

  1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.

  2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754].

  3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.

  4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.

  5. Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.

  6. [RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.

Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.

While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.

  1. [ULID] by A. Feerasta

  2. [LexicalUUID] by Twitter

  3. [Snowflake] by Twitter

  4. [Flake] by Boundary

  5. [ShardingID] by Instagram

  6. [KSUID] by Segment

  7. [Elasticflake] by P. Pearcy

  8. [FlakeID] by T. Pawlak

  9. [Sonyflake] by Sony

  10. [orderedUuid] by IT. Cabrera

  11. [COMBGUID] by R. Tallent

  12. [SID] by A. Chilton

  13. [pushID] by Google

  14. [XID] by O. Poitrey

  15. [ObjectID] by MongoDB

  16. [CUID] by E. Elliott

An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.

2. Terminology

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2. Abbreviations

The following abbreviations are used in this document:

UUID
Universally Unique Identifier [RFC4122]
CSPRNG
Cryptographically Secure Pseudo-Random Number Generator
MAC
Media Access Control
MSB
Most Significant Bit
DBMS
Database Management System

3. Summary of Changes

The following UUIDs are hereby introduced:

UUID version 6 (UUIDv6)
A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See Section 5.1
UUID version 7 (UUIDv7)
An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See Section 5.2
UUID version 8 (UUIDv8)
A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See Section 5.3
Max UUID
A specialized UUID which is the inverse of [RFC4122], Section 4.1.7 See Section 5.4

3.1. changelog

RFC EDITOR PLEASE DELETE THIS SECTION.

draft-04

  • - Fixed bad title in IEEE754 Normative Reference

  • - Fixed bad GMT offset in Test Vector Appendix

  • - Removed MAY in Counters section

  • - Condensed Counter Type into Counter Methods to reduce text

  • - Removed option for random increment along with fixed-length counter

  • - Described how to handle scenario where New UUID less than Old UUID

  • - Allow timestamp increment if counter overflows

  • - Replaced UUIDv8 C code snippet with full generation example

  • - Fixed RFC4086 Reference link

  • - Describe reseeding best practice for CSPRNG

  • - Changed MUST to SHOULD removing requirement for absolute monotonicity

draft-03

  • - Reworked the draft body to make the content more concise

  • - UUIDv6 section reworked to just the reorder of the timestamp

  • - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp

  • - UUIDv8 relaxed to be custom in all elements except version and variant

  • - Introduced Max UUID.

  • - Added C code samples in Appendix.

  • - Added test vectors in Appendix.

  • - Version and Variant section combined into one section.

  • - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).

  • - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.

  • - Split Encoding and Storage into Opacity and DBMS and Database Considerations

  • - Reworked Global Uniqueness under new section Global and Local Uniqueness

  • - Node verbiage only used in UUIDv6 all others reference random/rand instead

  • - Clock sequence verbiage changed simply to counter in any section other than UUIDv6

  • - Added Abbreviations section

  • - Updated IETF Draft XML Layout

  • - Added information about little-endian UUIDs

draft-02

  • - Added Changelog

  • - Fixed misc. grammatical errors

  • - Fixed section numbering issue

  • - Fixed some UUIDvX reference issues

  • - Changed all instances of "motonic" to "monotonic"

  • - Changed all instances of "#-bit" to "# bit"

  • - Changed "proceeding" verbiage to "after" in section 7

  • - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7

  • - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7

  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.

  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4

draft-01

  • - Complete rewrite of entire document.

  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.

  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.

  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)

  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)

4. Variant and Version Fields

The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1.

The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1.

Table 1: New UUID variant 10xx (8/9/A/B) versions defined by this specification
Msb0 Msb1 Msb2 Msb3 Version Description
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document

For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.

00000000-0000-6000-8000-000000000000
00000000-0000-6000-9000-000000000000
00000000-0000-6000-A000-000000000000
00000000-0000-6000-B000-000000000000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Figure 1: UUIDv6 Variant Examples

5. New Formats

The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.

5.1. UUID Version 6

UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead.

Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp.

The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5.

The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8

The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           time_high                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           time_mid            |      time_low_and_version     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         node (2-5)                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: UUIDv6 Field and Bit Layout
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)

With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.

5.2. UUID Version 7

UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6.

Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: UUIDv7 Field and Bit Layout
unix_ts_ms:
48 bit big-endian unsigned number of Unix epoch timestamp as per Section 6.1.
ver:
4 bit UUIDv7 version set as per Section 4
rand_a:
12 bits pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.
var:
The 2 bit variant defined by Section 4.
rand_b:
The final 62 bits of pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.

5.3. UUID Version 8

UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed.

The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data.

Some example situations in which UUIDv8 usage could occur:

  • An implementation would like to embed extra information within the UUID other than what is defined in this document.

  • An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_a                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          custom_a             |  ver  |       custom_b        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                       custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: UUIDv8 Field and Bit Layout
custom_a:
The first 48 bits of the layout that can be filled as an implementation sees fit.
ver:
The 4 bit version field as defined by Section 4
custom_b:
12 more bits of the layout that can be filled as an implementation sees fit.
var:
The 2 bit variant field as defined by Section 4.
custom_c:
The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit.

5.4. Max UUID

The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7

FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF
Figure 5: Max UUID Format

6. UUID Best Practices

The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.

6.1. Timestamp Granularity

UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.

Reliability:
Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)
Source:
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged.
Sub-second Precision and Accuracy:
Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized.
Length:
The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp.
Altering, Fuzzing, or Smearing:
Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time.
Padding:
When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7.
Truncating:
Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7.

6.2. Monotonicity and Counters

Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section.

Additionally, care SHOULD be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp.

For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3

Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.

Fixed-Length Dedicated Counter Bits (Method 1):
This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 SHOULD be utilized as fixed-length dedicated counter bits that are incremented by one for every UUID generation. The trailing random bits generated for each new UUID in rand_b can help produce unguessable UUIDs. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits.
Monotonic Random (Method 2):
With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero. It ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy. The increment value MAY be one when the amount of UUIDs generated in a particular period of time is important and guessability is not an issue. However, it SHOULD NOT be used by implementations that favor unguessiblity, as the resulting values are easily guessable.

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

Fixed-Length Dedicated Counter Seeding:
Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers.
Fixed-Length Dedicated Counter Length:
Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.

The following sub-topics cover rollover handling with either type of counter method:

Counter Rollover Guards:
The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also helpful to mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding.
Counter Rollover Handling:
Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.

Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:

  1. Compare the current timestamp against the previously stored timestamp.

  2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method.

  3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).

Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous. To handle this scenario, the general guidance is that application MAY reuse the previous timestamp and increment the previous counter method.

6.3. Distributed UUID Generation

Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments.

Centralized Registry:
With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification.
Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per Section 8. The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification.

Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement.

6.4. Collision Resistance

Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3.

There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.

Low Impact
A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format.
High Impact:
A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context.

6.5. Global and Local Uniqueness

UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required.

Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122].

6.6. Unguessability

Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation. CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs.

Advice on generating cryptographic-quality random numbers can be found in [RFC4086]

6.7. Sorting

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection.

Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large.

UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation.

UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.

6.8. Opacity

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

6.9. DBMS and Database Considerations

For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.

For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:

  • Storing as binary requires less space and may result in faster data access.

  • Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.

DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.

7. IANA Considerations

This document has no IANA actions.

8. Security Considerations

MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information.

Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.

9. Acknowledgements

The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.

10. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.
[RFC4086]
Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", RFC 4086, DOI 10.17487/RFC4086, , <https://www.rfc-editor.org/info/rfc4086>.

11. Informative References

[LexicalUUID]
Twitter, "A Scala client for Cassandra", commit f6da4e0, , <https://github.com/twitter-archive/cassie>.
[Snowflake]
Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, , <https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010>.
[Flake]
Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, , <https://github.com/boundary/flake>.
[ShardingID]
Instagram Engineering, "Sharding & IDs at Instagram", , <https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c>.
[KSUID]
Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, , <https://github.com/segmentio/ksuid>.
[Elasticflake]
Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, , <https://github.com/ppearcy/elasticflake>.
[FlakeID]
Pawlak, T., "Flake ID Generator", Commit fcd6a2f, , <https://github.com/T-PWK/flake-idgen>.
[Sonyflake]
Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, , <https://github.com/sony/sonyflake>.
[orderedUuid]
Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", , <https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8>.
[COMBGUID]
Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, , <https://github.com/richardtallent/RT.Comb>.
[ULID]
Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, , <https://github.com/ulid/spec>.
[SID]
Chilton, A., "sid : generate sortable identifiers", Commit 660e947, , <https://github.com/chilts/sid>.
[pushID]
Google, "The 2^120 Ways to Ensure Unique Identifiers", , <https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html>.
[XID]
Poitrey, O., "Globally Unique ID Generator", Commit efa678f, , <https://github.com/rs/xid>.
[ObjectID]
MongoDB, "ObjectId - MongoDB Manual", <https://docs.mongodb.com/manual/reference/method/ObjectId/>.
[CUID]
Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, , <https://github.com/ericelliott/cuid>.
[IEEE754]
IEEE, "IEEE Standard for Floating-Point Arithmetic.", Series 754-2019, , <https://standards.ieee.org/ieee/754/6210/>.

Appendix A. Example Code

A.1. Creating a UUIDv6 Value

This section details a function in C which converts from a UUID version 1 to version 6:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>
#include <uuid/uuid.h>

/* Converts UUID version 1 to version 6 in place. */
void uuidv1tov6(uuid_t u) {

  uint64_t ut;
  unsigned char *up = (unsigned char *)u;

  // load ut with the first 64 bits of the UUID
  ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32;
  ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4])));

  // dance the bit-shift...
  ut =
    ((ut >> 32) & 0x0FFF) | // 12 least significant bits
    (0x6000) | // version number
    ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits
    ((ut << 20) & 0x000FFFF000000000) | // next 16 bits
    (ut << 52); // 12 most significant bits

  // store back in UUID
  *((uint32_t*)up) = htonl((uint32_t)(ut >> 32));
  *((uint32_t*)&up[4]) = htonl((uint32_t)(ut));

}
Figure 6: UUIDv6 Function in C

A.2. Creating a UUIDv7 Value

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
    printf("fopen /dev/urandom error\n");
    return 1;
}

// ...

// generate one UUIDv7E
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
    printf("clock_gettime error: %d\n", ret);
    return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0
u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1
Figure 7: UUIDv7 Function in C

A.3. Creating a UUIDv8 Value

UUIDv8 will vary greatly from implementation to implementation.

The following example utilizes:

  • 32 bit custom-epoch timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC)

  • 16 bit exotic resolution (~15 microsecond) subsecond timestamp encoded using the fractional representation

  • 58 bit random number

  • 8 bit application-specific unique node ID

  • 8 bit rolling sequence number

#include <stdint.h>
#include <time.h>

int get_random_bytes(uint8_t *buffer, int count) {
  // ...
}

int generate_uuidv8(uint8_t *uuid, uint8_t node_id) {
  struct timespec tp;
  if (clock_gettime(CLOCK_REALTIME, &tp) != 0)
    return -1; // real-time clock error

  // 32 bit biased timestamp (seconds elapsed since 2020-01-01 00:00:00 UTC)
  uint32_t timestamp_sec = tp.tv_sec - 1577836800;
  uuid[0] = timestamp_sec >> 24;
  uuid[1] = timestamp_sec >> 16;
  uuid[2] = timestamp_sec >> 8;
  uuid[3] = timestamp_sec;

  // 16 bit subsecond fraction (~15 microsecond resolution)
  uint16_t timestamp_subsec = ((uint64_t)tp.tv_nsec << 16) / 1000000000;
  uuid[4] = timestamp_subsec >> 8;
  uuid[5] = timestamp_subsec;

  // 58 bit random number and required ver and var fields
  if (get_random_bytes(&uuid[6], 8) != 0)
    return -1; // random number generator error
  uuid[6] = 0x80 | (uuid[6] & 0x0f);
  uuid[8] = 0x80 | (uuid[8] & 0x3f);

  // 8 bit application-specific node ID to guarantee application-wide uniqueness
  uuid[14] = node_id;

  // 8 bit rolling sequence number to help ensure process-wide uniqueness
  static uint8_t sequence = 0;
  uuid[15] = sequence++; // NOTE: unprotected from race conditions

  return 0;
}
Figure 8: UUIDv8 Function in C

Appendix B. Test Vectors

Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00

Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.

# Unix Nanosecond precision to Gregorian 100-nanosecond intervals
gregorian_100_ns = (Unix_64_bit_nanoseconds / 100) + gregorian_Unix_offset

# Gregorian to Unix Offset:
# The number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
# gregorian_Unix_offset = 0x01b21dd213814000 or 122192928000000000

# Unix 64 bit Nanosecond Timestamp:
# Unix NS: Tuesday, February 22, 2022 2:22:22 PM GMT-05:00
# Unix_64_bit_nanoseconds = 0x16D6320C3D4DCC00 or 1645557742000000000

# Work:
# gregorian_100_ns = (1645557742000000000 / 100) + 122192928000000000
# (138648505420000000 - 122192928000000000) * 100 = Unix_64_bit_nanoseconds

# Final:
# gregorian_100_ns = 0x1EC9414C232AB00 or 138648505420000000

# Original: 000111101100100101000001010011000010001100101010101100000000
# UUIDv1:   11000010001100101010101100000000|1001010000010100|0001|000111101100
# UUIDv6:   00011110110010010100000101001100|0010001100101010|0110|101100000000
Figure 9: Test Vector Timestamp Pseudo-code

B.1. Example of a UUIDv6 Value

----------------------------------------------
field                 bits    value_hex
----------------------------------------------
time_low              32      0xC232AB00
time_mid              16      0x9414
time_hi_and_version   16      0x11EC
clk_seq_hi_res         8      0xB3
clock_seq_low          8      0xC8
node                  48      0x9E6BDECED846
----------------------------------------------
total                128
----------------------------------------------
final_hex: C232AB00-9414-11EC-B3C8-9E6BDECED846
Figure 10: UUIDv1 Example Test Vector
-----------------------------------------------
field                 bits    value_hex
-----------------------------------------------
time_high              32      0x1EC9414C
time_mid               16      0x232A
time_low_and_version   16      0x6B00
clk_seq_hi_res          8      0xB3
clock_seq_low           8      0xC8
node                   48      0x9E6BDECED846
-----------------------------------------------
total                 128
-----------------------------------------------
final_hex: 1EC9414C-232A-6B00-B3C8-9E6BDECED846
Figure 11: UUIDv6 Example Test Vector

B.2. Example of a UUIDv7 Value

This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits

rand_a and rand_b are filled with random data.

The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F22E279B0 or 1645557742000

-------------------------------
field      bits    value
-------------------------------
unix_ts_ms   48    0x17F22E279B0
var           4    0x7
rand_a       12    0xCC3
var           2    b10
rand_b       62    0x18C4DC0C0C07398F
-------------------------------
total       128
-------------------------------
final: 017F22E2-79B0-7CC3-98C4-DC0C0C07398F
Figure 12: UUIDv7 Example Test Vector

B.3. Example of a UUIDv8 Value

This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version.

The next two segments of custom_b and custom_c are are filled with random data.

Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000

It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.

-------------------------------
field      bits    value
-------------------------------
custom_a     48    0x320C3D4DCC00
ver           4    0x8
custom_b     12    0x75B
var           2    b10
custom_c     62    0xEC932D5F69181C0
-------------------------------
total       128
-------------------------------
final: 320C3D4D-CC00-875B-8EC9-32D5F69181C0
Figure 13: UUIDv8 Example Test Vector

Appendix C. Version and Variant Tables

C.1. Variant 10xx Versions

Table 2: All UUID variant 10xx (8/9/A/B) version definitions.
Msb0 Msb1 Msb2 Msb3 Version Description
0 0 0 0 0 Unused
0 0 0 1 1 The Gregorian time-based UUID from in [RFC4122], Section 4.1.3
0 0 1 0 2 DCE Security version, with embedded POSIX UIDs from [RFC4122], Section 4.1.3
0 0 1 1 3 The name-based version specified in [RFC4122], Section 4.1.3 that uses MD5 hashing.
0 1 0 0 4 The randomly or pseudo-randomly generated version specified in [RFC4122], Section 4.1.3.
0 1 0 1 5 The name-based version specified in [RFC4122], Section 4.1.3 that uses SHA-1 hashing.
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document.
1 0 0 1 9 Reserved for future definition.
1 0 1 0 10 Reserved for future definition.
1 0 1 1 11 Reserved for future definition.
1 1 0 0 12 Reserved for future definition.
1 1 0 1 13 Reserved for future definition.
1 1 1 0 14 Reserved for future definition.
1 1 1 1 15 Reserved for future definition.

Authors' Addresses

Brad G. Peabody
Kyzer R. Davis
================================================ FILE: old drafts/.gitignore ================================================ misc-notes.txt ================================================ FILE: old drafts/README.md ================================================ # UUID Version 6 IETF Draft This is the IETF draft for a version 6 UUID. Various discussion will need to occur to arrive at a standard and this repo will be used to collect and organize that information. The following is a list of relevant topics related to this draft. Pull requests will be accepted for changes to Concerns and Possible Solutions or to introduce a new Topic if it is missing, *as long as the text is concise, clear and objective.* PRs will not be accepted for changes to the decision made for the draft without full discussion. Please make an issue to discuss such things. - Topic: **Length**. - Concerns: - A lot of existing code expects a UUID to be 128 bits. Even when other aspects of the format are changed, this can provide a good deal of backward compatibility. - Some applications may need more than just 16 bytes to ensure uniqueness, depending on how many IDs they have to generate and under what circumstands. - Some applications may benefit from having shorter IDs when global uniqueness is not a requirement (e.g. local uniqueness will suffice) and easier human use of a shorter value is a priority. - Possible Solutions: - Keep the same length (16 bytes) - Change the size to something longer or shorter - Introduce a system for variable-length UUIDs - Current Decision Per Draft: - *Keep the same length.* Introducing different length(s) would break backward compatibility and is not generally useful enough to be worth it. If you need something other than 128 bits, it's not a UUID. - Topic: **Text Encoding** - TODO - Topic: **Timestamp** - TODO - Topic: **Local/Global Uniqueness** - TODO ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-00.txt ================================================ dispatch BGP. Peabody Internet-Draft February 24, 2020 Updates: 4122 (if approved) Intended status: Standards Track Expires: August 27, 2020 UUID Format Update draft-peabody-dispatch-new-uuid-format-00 Abstract This document presents a new UUID format (version 6) which is suited for use as a database key. A common case for modern applications is to create a unique identifier to be used as a primary key in a database table that is ordered by creation time, difficult to guess and has a compact text format. None of the existing UUID versions fulfill each of these requirements. This document is a proposal to update RFC4122 with a new UUID version that addresses these concerns. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 27, 2020. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Peabody Expires August 27, 2020 [Page 1] Internet-Draft new-uuid-format February 2020 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 5 3.1. Version 6 . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Timestamp . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3. Clock Sequence and Node Parts . . . . . . . . . . . . . . 5 3.4. Alternate Text Formats . . . . . . . . . . . . . . . . . 6 3.4.1. Base64 Text (Variant A) . . . . . . . . . . . . . . . 7 3.4.2. Base32 Text . . . . . . . . . . . . . . . . . . . . . 7 4. Uniquness Service . . . . . . . . . . . . . . . . . . . . . . 7 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 8. Normative References . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Background A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. However some properties of the existing specification are not well suited to this task. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become not worth it. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. Peabody Expires August 27, 2020 [Page 2] Internet-Draft new-uuid-format February 2020 However, most of the existing UUID versions have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. Newly inserted values should be time-ordered to address this. Version 1 UUIDs are time-ordered, but have other issues (see next). A point of convenience and simplicity of implementation is that custom sort ordering logic should not be needed to put time ordered values in sequence. It is possible to sort Version 1 UUIDs by time but it requires breaking the bytes of the UUID into various pieces to determine the order (from the timestamp). Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. This covers the first 64 bits of the UUID. The latter portion (the last 64 bits) are in essence used to provide uniqueness. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used to locate machines to attack and can reveal various information about such machines (minimally manufacturer, potentially other details). The use of MAC addresses in UUID Version 1, and the other hashing schemes used in the various versions, points to a more basic issue: There is no known way to guarantee "universal uniqueness". In fact, uniqueness needs are application-specific. MAC addresses in the node field might be okay for some applications. Others might be okay with using cryptographically secure random numbers (possibly with increased risk of collision). Still others might already have a predefined means to determine uniqueness for the application in question, such as a server node number. In an attempt to ensure uniqueness, the existing UUID format over-specifies exactly how this uniqueness is determined. This document posits the idea that while such mechanisms as MAC address may be okay for certain applications, it should be treated as a suggestion, not a requirement for proper implementation. Many applications will work perfectly well with more narrow and simpler uniqueness mechanisms (like using an existing node ID from whatever cluster the server is already in) and that this should be allowed as long as the uniqueness properties are clearly specified in the implementation. I.e. "using this field type as a database primary key will produce UUIDs which are unique within this database cluster" should be perfectly acceptable. Some other unnecessary requirement of global/universal uniqueness should not be needed for the implementation to be considered correct. Peabody Expires August 27, 2020 [Page 3] Internet-Draft new-uuid-format February 2020 The property of "unguessability" is also application-specific. Some applications may desire increased security by using UUIDs which are difficult to guess (this way for example rate-limiting can be used to greatly reduce the probabililty of someone correctly guessing a new identifier or at least make it harder/take longer to do so). While applications should of course be using proper security measures, and relying solely on the unguessability of an identifier for security purposes is ill-advised, it is certainly not wrong to use this property as an additional layer of security. Examples of measures used to increase unguessability would be using cryptographically secure random data in the node and/or clock sequence fields (latter 64 bits), or using such random data in the subsecond portion of the timestamp (if subsecond time ordering is less important than unguessability for the application in question). The specification should indicate that such variations are acceptable as they do not change the format in an incompatible way. Using a UUID as a database key generally requires communicating that UUID to other applications. The database server will store the value internally. It may be referenced in a query language (e.g. SQL), and/or transmitted in some database driver protocol. Other software, often written in another language, frequently then needs to store this identifier in its own memory and potentially perform its own operations like sorting and searching with it. And such identifiers are also commonly then used in protocols like HTTP where they indicate a particular resource. Sometimes they are typed in by humans. Sometimes constraints exist on which bytes may be used (such as an HTTP URL path). In most cases, shorter is better. For these reasons, having a compact textual format is important. The existing hex format is already in wide use, so keeping it for backward compatibility makes sense. However an encoding using a base32 alphabet would be more compact and still be case-insensitive. A base64 alphabet would be even more compact (but require case- sensitivity). This document proposes both as options. This would allow applications to use a more compact text format for the situations needing textual representation (i.e. you can just put this value in URL and it is not unnecessarily long and does not require escaping). The alphabets used for base32 and base64 encoding should be in ASCII numeric value sequence so the text forms can also be sorted correctly as raw bytes. (This is not a property of the Base32 and Base64 standards from [RFC4648], however there are several variations in use so introducing a new one here for the express purpose of correct sorting would seem to be acceptable.) Peabody Expires August 27, 2020 [Page 4] Internet-Draft new-uuid-format February 2020 3. Summary of Changes The following is a summary of proposed changes to the UUID specification in [RFC4122]. Each is given as a statement of the problem or limitation to which it is addressed, along with a description of the proposed change. 3.1. Version 6 A UUID version 6 is proposed. It is ordered by creation time, sorts correctly as raw bytes, does not require use of a MAC address in the node section and has options for a compact text format. 3.2. Timestamp The timestamp value from [RFC4122] (60-bit number of 100- nanosecond intervals since 00:00:00.00, 15 October 1582) is workable but the sequence in which the bytes are encoded (the lowest bytes first) results in unnecessary additional logic to sort correctly by timestamp. Ordering by timestamp is important for the use case of UUIDs as primary keys in a database since it improves locality by grouping new records close to each other (this can have major performance implications in large tables). The proposed change is to encode the timestamp value into the same 60 bits as in [RFC4122] but in big-endian byte ordering. This way an application can sort by timestamp by simply treating the UUID as an opaque bunch of bytes. 3.3. Clock Sequence and Node Parts The latter 64 bits of a UUID per [RFC4122] are the clock sequence and node fields. The node field is problematic as it encourages applications to use their MAC address which may present a security problem (it is not always appropriate to reveal the network address of a machine as it could make it the target of an attack or provide information about its manufacturer or other details). A lesser concern is that it also incidentally produces UUID with the same 6 bytes at the end and are visually more difficult to distinguish when looking at them in a list. Seeing as the entire point of these last 64 bits is to ensure uniqueness, this document proposes that the strict definitions of clock sequence and node be relaxed. Instead implementations would be permitted to fill this section with random bytes and/or include an application defined value for uniqueness (such as a node number of a machine in a cluster). Peabody Expires August 27, 2020 [Page 5] Internet-Draft new-uuid-format February 2020 Note for discussion: Another point to consider is that there is no known way to fully guarantee that that duplicate identifiers will not be created unless some per-determined outside source of uniqueness is employed. (Such as for version 1 UUIDs the MAC address.) However, applications each have their own requirements for uniqueness. Uniqueness within a single database cluster for example is acceptable in many cases. A specification that forces all UUIDs to be globally unique when it is not needed might not be a good idea. Identifiers are only as universally unique as their input, so it might be better to just clearly state this and say that it's fine if UUIDs are only guaranteed to be unique within a specific context if it makes sense for that application. 3.4. Alternate Text Formats The existing UUID text format is hex encoded plus four hyphens. For many applications this is unnecessarily verbose. The same information can be encoded into significantly fewer bytes using a base 64 or base 32 alphabet. Many applications have a need to use the unique identifier of a database record in a URL (e.g. in an HTTP request either in the path or a query parameter). It can also be useful as a file name. Being able to use a UUID for this purpose without having to escape certain characters it is a useful property. This document proposes alternate alphabets for encoding UUIDs which are convenient for use in URLs and file names, and also sort correctly when treated as raw bytes. Some applications may not have the ability (or want) to encode and decode UUIDs from text to binary and thus having the text format also sort correctly as raw bytes is useful. The standard Base64 and Base32 specifications in [RFC4648] do not have these properties, thus different alphabets are given for each. Situations which require understanding the encoding SHOULD specify which encoding is used. For example, a database field which uses UUID version 6 with "b64a" encoding (see below), could be specified as type "UUID6B64A", which would result in binary storage according to UUID version 6, and otherwise read and write the value to/from applications in the b64a text format shown below. Note also that the length can be easily used to positively distinguish if a value is text or binary form. A 16-byte value will necessarily be raw unencoded bytes whereas text forms will be longer. Peabody Expires August 27, 2020 [Page 6] Internet-Draft new-uuid-format February 2020 3.4.1. Base64 Text (Variant A) UUIDs encoded in this form use the "url-safe base64" alphabet: "A" to "Z", "a" to "z", "0" to "9" and "-" and "_", but in ASCII value sequence. No padding characters are used. The name "b64a" (not case sensitive) can be used by implementations to refer to this encoding. Note: It might be useful to add another variation ("b64b") with a different alphabet. Hyphen and underscore are useful in a lot of places but there might be some others that are better for specific cases. 3.4.2. Base32 Text Base32 can be useful if case-insensitivity is required. UUIDs encoded in this form use digits "2" through "7" followed by "A" through "Z" (same alphabet as in [RFC4648] but in ASCII value sequence). Case is not sensitive. Implementations MAY choose to output lower case letters and doing so is also correct. Implementations which parse UUIDs encoded in this way MUST be case insensitive. No padding characters are used. Unless there is a sepcific reason for an implementation to do otherwise, it SHOULD output lower case base32 characters. The motivation for this it will increase the number of situations where UUIDs encoded in base32 and then used in different environments (some of which may be case sensitive, some not) are handled correctly by default. For example file names are case sensitive on some file systems and not on others. Preferring one specific (lower) case allows these to be used interchangably with predictable results. The name "b32a" (not case sensitive) can be used by implementations to refer to this encoding. 4. Uniquness Service An idea for discssion is that for applications which truly require globally unique identifiers one possible solution would be for someone to maintain a service which allocates numbers by time. In essense and for example "give me a 32-bit number that will be unique for the time range of midnight to midnight tomorrow". Such a service would be relaitvely easy to create. The effort required to maintain it depends largely on how much it is used. Applications using the same endpoint for this service would be guaranteed unique UUIDs. Companies could host their own too. I'm not sure if this sort of thing would be worth the effort but it's another idea for how to Peabody Expires August 27, 2020 [Page 7] Internet-Draft new-uuid-format February 2020 address the global uniqueness issue for applications that really need it. 5. Acknowledgements TODO: Acknowledgements for prior work and discussion. 6. IANA Considerations TBD 7. Security Considerations TODO: Provide additional information on "unguessability" as needed. 8. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, . Author's Address Brad G. Peabody Email: brad@peabody.io Peabody Expires August 27, 2020 [Page 8] ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-00.xml ================================================ ]> UUID Format Update
brad@peabody.io
ART dispatch uuid This document presents a new UUID format (version 6) which is suited for use as a database key. A common case for modern applications is to create a unique identifier to be used as a primary key in a database table that is ordered by creation time, difficult to guess and has a compact text format. None of the existing UUID versions fulfill each of these requirements. This document is a proposal to update RFC4122 with a new UUID version that addresses these concerns.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .
A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. However some properties of the existing specification are not well suited to this task. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become not worth it. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. However, most of the existing UUID versions have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. Newly inserted values should be time-ordered to address this. Version 1 UUIDs are time-ordered, but have other issues (see next). A point of convenience and simplicity of implementation is that custom sort ordering logic should not be needed to put time ordered values in sequence. It is possible to sort Version 1 UUIDs by time but it requires breaking the bytes of the UUID into various pieces to determine the order (from the timestamp). Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. This covers the first 64 bits of the UUID. The latter portion (the last 64 bits) are in essence used to provide uniqueness. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used to locate machines to attack and can reveal various information about such machines (minimally manufacturer, potentially other details). The use of MAC addresses in UUID Version 1, and the other hashing schemes used in the various versions, points to a more basic issue: There is no known way to guarantee "universal uniqueness". In fact, uniqueness needs are application-specific. MAC addresses in the node field might be okay for some applications. Others might be okay with using cryptographically secure random numbers (possibly with increased risk of collision). Still others might already have a predefined means to determine uniqueness for the application in question, such as a server node number. In an attempt to ensure uniqueness, the existing UUID format over-specifies exactly how this uniqueness is determined. This document posits the idea that while such mechanisms as MAC address may be okay for certain applications, it should be treated as a suggestion, not a requirement for proper implementation. Many applications will work perfectly well with more narrow and simpler uniqueness mechanisms (like using an existing node ID from whatever cluster the server is already in) and that this should be allowed as long as the uniqueness properties are clearly specified in the implementation. I.e. "using this field type as a database primary key will produce UUIDs which are unique within this database cluster" should be perfectly acceptable. Some other unnecessary requirement of global/universal uniqueness should not be needed for the implementation to be considered correct. The property of "unguessability" is also application-specific. Some applications may desire increased security by using UUIDs which are difficult to guess (this way for example rate-limiting can be used to greatly reduce the probabililty of someone correctly guessing a new identifier or at least make it harder/take longer to do so). While applications should of course be using proper security measures, and relying solely on the unguessability of an identifier for security purposes is ill-advised, it is certainly not wrong to use this property as an additional layer of security. Examples of measures used to increase unguessability would be using cryptographically secure random data in the node and/or clock sequence fields (latter 64 bits), or using such random data in the subsecond portion of the timestamp (if subsecond time ordering is less important than unguessability for the application in question). The specification should indicate that such variations are acceptable as they do not change the format in an incompatible way. Using a UUID as a database key generally requires communicating that UUID to other applications. The database server will store the value internally. It may be referenced in a query language (e.g. SQL), and/or transmitted in some database driver protocol. Other software, often written in another language, frequently then needs to store this identifier in its own memory and potentially perform its own operations like sorting and searching with it. And such identifiers are also commonly then used in protocols like HTTP where they indicate a particular resource. Sometimes they are typed in by humans. Sometimes constraints exist on which bytes may be used (such as an HTTP URL path). In most cases, shorter is better. For these reasons, having a compact textual format is important. The existing hex format is already in wide use, so keeping it for backward compatibility makes sense. However an encoding using a base32 alphabet would be more compact and still be case-insensitive. A base64 alphabet would be even more compact (but require case-sensitivity). This document proposes both as options. This would allow applications to use a more compact text format for the situations needing textual representation (i.e. you can just put this value in URL and it is not unnecessarily long and does not require escaping). The alphabets used for base32 and base64 encoding should be in ASCII numeric value sequence so the text forms can also be sorted correctly as raw bytes. (This is not a property of the Base32 and Base64 standards from , however there are several variations in use so introducing a new one here for the express purpose of correct sorting would seem to be acceptable.)
The following is a summary of proposed changes to the UUID specification in . Each is given as a statement of the problem or limitation to which it is addressed, along with a description of the proposed change.
A UUID version 6 is proposed. It is ordered by creation time, sorts correctly as raw bytes, does not require use of a MAC address in the node section and has options for a compact text format.
The timestamp value from (60-bit number of 100- nanosecond intervals since 00:00:00.00, 15 October 1582) is workable but the sequence in which the bytes are encoded (the lowest bytes first) results in unnecessary additional logic to sort correctly by timestamp. Ordering by timestamp is important for the use case of UUIDs as primary keys in a database since it improves locality by grouping new records close to each other (this can have major performance implications in large tables). The proposed change is to encode the timestamp value into the same 60 bits as in but in big-endian byte ordering. This way an application can sort by timestamp by simply treating the UUID as an opaque bunch of bytes.
The latter 64 bits of a UUID per are the clock sequence and node fields. The node field is problematic as it encourages applications to use their MAC address which may present a security problem (it is not always appropriate to reveal the network address of a machine as it could make it the target of an attack or provide information about its manufacturer or other details). A lesser concern is that it also incidentally produces UUID with the same 6 bytes at the end and are visually more difficult to distinguish when looking at them in a list. Seeing as the entire point of these last 64 bits is to ensure uniqueness, this document proposes that the strict definitions of clock sequence and node be relaxed. Instead implementations would be permitted to fill this section with random bytes and/or include an application defined value for uniqueness (such as a node number of a machine in a cluster). Note for discussion: Another point to consider is that there is no known way to fully guarantee that that duplicate identifiers will not be created unless some per-determined outside source of uniqueness is employed. (Such as for version 1 UUIDs the MAC address.) However, applications each have their own requirements for uniqueness. Uniqueness within a single database cluster for example is acceptable in many cases. A specification that forces all UUIDs to be globally unique when it is not needed might not be a good idea. Identifiers are only as universally unique as their input, so it might be better to just clearly state this and say that it's fine if UUIDs are only guaranteed to be unique within a specific context if it makes sense for that application.
The existing UUID text format is hex encoded plus four hyphens. For many applications this is unnecessarily verbose. The same information can be encoded into significantly fewer bytes using a base 64 or base 32 alphabet. Many applications have a need to use the unique identifier of a database record in a URL (e.g. in an HTTP request either in the path or a query parameter). It can also be useful as a file name. Being able to use a UUID for this purpose without having to escape certain characters it is a useful property. This document proposes alternate alphabets for encoding UUIDs which are convenient for use in URLs and file names, and also sort correctly when treated as raw bytes. Some applications may not have the ability (or want) to encode and decode UUIDs from text to binary and thus having the text format also sort correctly as raw bytes is useful. The standard Base64 and Base32 specifications in do not have these properties, thus different alphabets are given for each. Situations which require understanding the encoding SHOULD specify which encoding is used. For example, a database field which uses UUID version 6 with "b64a" encoding (see below), could be specified as type "UUID6B64A", which would result in binary storage according to UUID version 6, and otherwise read and write the value to/from applications in the b64a text format shown below. Note also that the length can be easily used to positively distinguish if a value is text or binary form. A 16-byte value will necessarily be raw unencoded bytes whereas text forms will be longer.
UUIDs encoded in this form use the "url-safe base64" alphabet: "A" to "Z", "a" to "z", "0" to "9" and "-" and "_", but in ASCII value sequence. No padding characters are used. The name "b64a" (not case sensitive) can be used by implementations to refer to this encoding. Note: It might be useful to add another variation ("b64b") with a different alphabet. Hyphen and underscore are useful in a lot of places but there might be some others that are better for specific cases.
Base32 can be useful if case-insensitivity is required. UUIDs encoded in this form use digits "2" through "7" followed by "A" through "Z" (same alphabet as in but in ASCII value sequence). Case is not sensitive. Implementations MAY choose to output lower case letters and doing so is also correct. Implementations which parse UUIDs encoded in this way MUST be case insensitive. No padding characters are used. Unless there is a sepcific reason for an implementation to do otherwise, it SHOULD output lower case base32 characters. The motivation for this it will increase the number of situations where UUIDs encoded in base32 and then used in different environments (some of which may be case sensitive, some not) are handled correctly by default. For example file names are case sensitive on some file systems and not on others. Preferring one specific (lower) case allows these to be used interchangably with predictable results. The name "b32a" (not case sensitive) can be used by implementations to refer to this encoding.
An idea for discssion is that for applications which truly require globally unique identifiers one possible solution would be for someone to maintain a service which allocates numbers by time. In essense and for example "give me a 32-bit number that will be unique for the time range of midnight to midnight tomorrow". Such a service would be relaitvely easy to create. The effort required to maintain it depends largely on how much it is used. Applications using the same endpoint for this service would be guaranteed unique UUIDs. Companies could host their own too. I'm not sure if this sort of thing would be worth the effort but it's another idea for how to address the global uniqueness issue for applications that really need it.
TODO: Acknowledgements for prior work and discussion.
TBD
TODO: Provide additional information on "unguessability" as needed.
&RFC2119; &RFC4122; &RFC4648;
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-01.html ================================================ New UUID Formats
Internet-Draft new-uuid-format April 2021
Peabody & Davis Expires 28 October 2021 [Page]
Workgroup:
dispatch
Internet-Draft:
draft-peabody-dispatch-new-uuid-format-01
Updates:
4122 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
BGP. Peabody
K. Davis

New UUID Formats

Abstract

This document presents new time-based UUID formats which are suited for use as a database key.

A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update [RFC4122] with three new UUID versions that address these concerns, each with different trade-offs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 28 October 2021.

1. Introduction

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Background

A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys.

The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments.

However some properties of [RFC4122] UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this.

While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re-ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such.

After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal.

Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application.

Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability.

Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.

While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.

  1. [LexicalUUID] by Twitter

  2. [Snowflake] by Twitter

  3. [Flake] by Boundary

  4. [ShardingID] by Instagram

  5. [KSUID] by Segment

  6. [Elasticflake] by P. Pearcy

  7. [FlakeID] by T. Pawlak

  8. [Sonyflake] by Sony

  9. [orderedUuid] by IT. Cabrera

  10. [COMBGUID] by R. Tallent

  11. [ULID] by A. Feerasta

  12. [SID] by A. Chilton

  13. [pushID] by Google

  14. [XID] by O. Poitrey

  15. [ObjectID] by MongoDB

  16. [CUID] by E. Elliott

An inspection of these implementations details the following trends that help define this standard:

  • - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.

  • - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.

  • - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.

  • - The ID format SHOULD be Lexicographically sortable while in the textual representation.

  • - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.

  • - IDs MUST NOT require unique network identifiers as part of achieving uniqueness.

  • - Distributed nodes MUST be able to create collision resistant Unique IDs without a consulting a centralized resource.

3. Summary of Changes

In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs.

The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Section 4.3

Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36-bit integer sections part, and is followed by a field of variable length which represents the sub-second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Section 4.4

Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time-based UUID while retaining the properties that make UUID great. See Section 4.5

4. Format

The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout.

4.1. Versions

Table 1 defines the 4-bit version found in Bits 48 through 51 within a given UUID.

Table 1: UUID versions defined by this specification
Msb0 Msb1 Msb2 Msb3 Version Description
0 1 1 0 6 Reordered Gregorian time-based UUID
0 1 1 1 7 Variable length Unix Epoch time-based UUID
1 0 0 0 8 Custom time-based UUID

4.2. Variant

The variant bits utilized by UUIDs in this specification remains the same as [RFC4122], Section 4.1.1.

The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation.

Table 2: UUID Variant defined by this specification
Msb0 Msb1 Msb2 Description
1 0 x The variant specified in this document.

4.3. UUIDv6 Layout and Bit Order

UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60-bit Gregorian timestamp source. The clock sequence bits remain unchanged from their usage and position in [RFC4122]. The 48-bit node MUST be set to a pseudo-random value.

The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 1

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           time_high                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           time_mid            |      time_low_and_version     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         node (2-5)                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: UUIDv6 Field and Bit Layout
time_high:
The most significant 32 bits of the 60-bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60-bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60-bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48-bit pseudo-random number used as a spatially unique identifier Occupies bits 80 through 127 (octets 10-15)

4.3.1. UUIDv6 Timestamp Usage

UUIDv6 reuses the 60-bit Gregorian timestamp with 100-nanosecond precision defined in [RFC4122], Section 4.1.4.

4.3.2. UUIDv6 Clock Sequence Usage

UUIDv6 makes no change to the Clock Sequence usage defined by [RFC4122], Section 4.1.5.

4.3.3. UUIDv6 Node Usage

UUIDv6 node bits SHOULD be set to a 48-bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.

4.3.4. UUIDv6 Basic Creation Algorithm

The following implementation algorithm is based on [RFC4122] but with changes specific to UUIDv6:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

  3. Set the time_low field to the 12 least significant bits of the starting 60-bit timestamp.

  4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid.

  5. Set the time_high field to the 32 most significant bits of the truncated timestamp.

  6. Set the time_mid field to the 16 least significant bits of the truncated timestamp.

  7. Create the 16-bit time_low_and_version by concatenating the 4-bit UUIDv6 version with the 12-bit time_low.

  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14-bit clock sequence value.

  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.

  10. Complete the 16-bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16-bit value.

  11. Generate a 48-bit psuedo-random node.

  12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node

  13. Save the state (current timestamp and clock sequence) back to the stable store

The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48-bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications.

Table 3: UUIDv1 to UUIDv6 Field Mappings
UUIDv1 Field Bits UUIDv6 Field
time_low 32 time_high
time_mid 16 time_mid
time_high 12 time_low

4.4. UUIDv7 Layout and Bit Order

The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-section precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly.

The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 2

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |       subsec_a        |  ver  |       subsec_b        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|                   subsec_seq_node                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       subsec_seq_node                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: UUIDv7 Field and Bit Layout
unixts:
36-bit big-endian unsigned Unix Timestamp value
subsec_a:
12-bits allocated to sub-section precision values.
ver:
The 4 bit UUIDv8 version (0111)
subsec_b:
12-bits allocated to sub-section precision values.
var:
2-bit UUID variant (10)
subsec_seq_node:
The remaining 62 bits which MAY be allocated to any combination of additional sub-section precision, sequence counter, or pseudo-random data.

4.4.1. UUIDv7 Timestamp Usage

UUIDv7 utilizes a 36-bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long).

Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout.

4.4.2. UUIDv7 Clock Sequence Usage

UUIDv7 SHOULD utilize a motonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits.

The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates.

4.4.3. UUIDv7 Node Usage

UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data.

4.4.4. UUIDv7 Encoding and Decoding

The UUIDv7 bit layout for encoding and decoding are described separately in this document.

4.4.4.1. UUIDv7 Encoding

Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits.

Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples.

All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub-second encoding).

In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits.

Examining Figure 3 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to a motonic clock sequence counter (seq).

  • The 2 Variant bits remain unchanged (var).

  • Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         msec          |  ver  |          seq          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|                         rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision)

In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits.

Examining Figure 4 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Microsecond precision (usec).

  • The 2 Variant bits remain unchanged (var).

  • A 14 bit motonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node

  • Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         usec          |  ver  |         usec          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|             seq           |            rand               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision)

In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits.

Examining Figure 5 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The 2 Variant bits remain unchanged (var).

  • The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The next 8 bits of subsec_seq_node dedicated a motonic clock sequence counter (seq).

  • Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         nsec          |  ver  |         nsec          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|             nsec          |      seq      |     rand      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision)
4.4.4.2. UUIDv7 Decoding

When decoding or parsing a UUIDv7 value there are only two values to be considered:

  1. The unix timestamp defined as unixts

  2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node

As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout.

Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence.

To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example:

  1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th).

  2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282

This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems.

Take for example the starting point for our next two UUIDv7 parsing scenarios:

  1. System A produces a UUIDv7 with a microsecond-precise timestamp value.
  2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A.

Scenario 1:

  1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder)
  2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds).

Scenario 2:

  1. System B parses the timestamp with nanosecond precision. (More precision than the encoder)
  2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7.

4.5. UUIDv8 Layout and Bit Order

UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs.

UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8. Some situations in which UUIDv8 usage could occur:

  • An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs.

  • An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs.

  • An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs.

  • An implementation would like to embed extra information within the UUID node other than what is defined in this document.

  • An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs.

Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128-bits.

  • - Timestamp Bits (Variable Length)

  • - Clock Sequence Bits (Variable Length)

  • - Node Bits (Variable Length)

  • - UUIDv8 Version Bits (4 bits)

  • - UUID Variant Bits (2 Bits)

The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data.

A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128-bit UUIDv8.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          timestamp_32                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           timestamp_48        |  ver  |      time_or_seq      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|  seq_or_node  |          node                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                              node                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: UUIDv8 Field and Bit Layout
timestamp_32:
The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3).
timestamp_48:
The next 16-bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32-bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47
ver:
The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51.
time_or_seq:
If a 60-bit, or larger, timestamp is used these 12-bits are used to fill out the remaining timestamp. If a 32 or 48-bit timestamp is leveraged a 12-bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7)
var:
2-bit UUID variant (10)
seq_or_node:
If a 60-bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8-bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8-bits SHOULD be set to random.
node:
In most implementations these bits will likely be set to pseudo-random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15)

4.5.1. UUIDv8 Timestamp Usage

UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8.

Some examples include:

  • - Custom Epoch

  • - NTP Timestamp

  • - ISO 8601 timestamp

The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined.

Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths:

  • 32-bit timestamp: using timestamp_32 and setting timestamp_48 to 0s

  • 48-bit timestamp: using timestamp_32 and timestamp_48 entirely

  • 60-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq

  • 64-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits.

Although it is possible to create a timestamp larger than 64-bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location.

Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36-bit timestamp source would fully utilize timestamp_32 and 4-bits of timestamp_48. The remaining 12-bits in timestamp_48 MUST be set to 0.

By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre-existing knowledge of the source timestamp used by the UUIDv8 implementation.

4.5.2. UUIDv8 Clock Sequence Usage

A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision.

The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged:

  • 12-bit clock sequence using time_or_seq for use when the timestamp is less than 48-bits which allows for 4095 UUIDs per clock tick.

  • 8-bit clock sequence using seq_or_node when the timestamp uses more than 48-bits which allows for 255 UUIDs per clock tick.

An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20-bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8-bits is not sufficient.

The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates.

4.5.3. UUIDv8 Node Usage

The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data.

The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size:

  • 62-bit node encompassing seq_or_node and node Used when a timestamp of 48-bits or less is leveraged.

  • 54-bit node when all 60-bits of the timestamp are in use and the seq_or_node is used as clock sequencing.

An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48-bits to ensure global uniqueness can be guaranteed.

4.5.4. UUIDv6 Basic Creation Algorithm

The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications.

The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification.

32-bit timestamp, 12-bit sequence counter, 62-bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp

  4. Set 16-bit timestamp_48 to all 0s

  5. Set the version to 8 (1000)

  6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (time_or_node) to 0

  7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_node).

  8. Set the variant to binary 10

  9. Generate 62 random bits and fill in 8-bits for seq_or_node and 54-bits for the node.

  10. Format by concatenating the 128-bits as: timestamp_32|timestamp_48|version|time_or_node|variant|seq_or_node|node

  11. Save the state (current timestamp and clock sequence) back to the stable store

48-bit timestamp, 12-bit sequence counter, 62-bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32-bit field timestamp_32 to the 32 most significant bits from the timestamp

  4. Set 16-bit timestamp_48 to the 16 least significant bits from the timestamp

  5. The rest of the steps are the same as the previous example.

60-bit timestamp, 8-bit sequence counter, 54-bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp

  4. Set 16-bit timestamp_48 to the 16 middle bits from the timestamp

  5. Set the version to 8 (1000)

  6. Set 12-bit time_or_node to the 12 least significant bits from the timestamp

  7. Set the variant to 10

  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (seq_or_node) to 0

  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node).

  10. Generate 54 random bits and fill in the node

  11. Format by concatenating the 128-bits as: timestamp_32|timestamp_48|version|time_or_node|variant|seq_or_node|node

  12. Save the state (current timestamp and clock sequence) back to the stable store

64-bit timestamp, 8-bit sequence counter, 54-bit node:

  1. The same steps as the 60-bit timestamp can be utilized if the 64-bit timestamp is truncated to 60-bits.

  2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60-bits and lose 4 bits of precision in the nanoseconds or microseconds position.

General algorithm for generation of UUIDv8 not defined here:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as desired bit total

  3. Set total amount of bits for timestamp as required in the most significant positions of the 128-bit UUID

  4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions.

    UUID Version: Bits 48 through 51

    UUID Variant: Bits 64 and 65

  5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0

  6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.

  7. Set the remaining bits to the node as pseudo-random data

  8. Format by concatenating the 128-bits together

  9. Save the state (current timestamp and clock sequence) back to the stable store

5. Encoding and Storage

The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability.

For many applications such as databases this format is unnecessarily verbose totaling 288 bits.

  • 8-bits for each of the 32 hex characters = 256 bits

  • 8-bits for each of the 4 hyphens = 32 bits

Where possible UUIDs SHOULD be stored within database applications as the underlying 128-bit binary value.

6. Global Uniqueness

UUIDs created by this specification offer the same guarantees for global uniqueness as those found in [RFC4122]. Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use-cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain.

7. Distributed UUID Generation

Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance.

This machine ID MUST be placed in the UUID proceeding the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification.

8. IANA Considerations

This document has no IANA actions.

9. Security Considerations

MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation.

Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.

The machineID portion of node, described in Section 7, does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole.

10. Acknowledgements

The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document.

11. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.

12. Informative References

[LexicalUUID]
Twitter, "A Scala client for Cassandra", commit f6da4e0, , <https://github.com/twitter-archive/cassie>.
[Snowflake]
Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, , <https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010>.
[Flake]
Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, , <https://github.com/boundary/flake>.
[ShardingID]
Instagram Engineering, "Sharding & IDs at Instagram", , <https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c>.
[KSUID]
Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, , <https://github.com/segmentio/ksuid>.
[Elasticflake]
Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, , <https://github.com/ppearcy/elasticflake>.
[FlakeID]
Pawlak, T., "Flake ID Generator", Commit fcd6a2f, , <https://github.com/T-PWK/flake-idgen>.
[Sonyflake]
Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, , <https://github.com/sony/sonyflake>.
[orderedUuid]
Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", , <https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8>.
[COMBGUID]
Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, , <https://github.com/richardtallent/RT.Comb>.
[ULID]
Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, , <https://github.com/ulid/spec>.
[SID]
Chilton, A., "sid : generate sortable identifiers", Commit 660e947, , <https://github.com/chilts/sid>.
[pushID]
Google, "The 2^120 Ways to Ensure Unique Identifiers", , <https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html>.
[XID]
Poitrey, O., "Globally Unique ID Generator", Commit efa678f, , <https://github.com/rs/xid>.
[ObjectID]
MongoDB, "ObjectId - MongoDB Manual", <https://docs.mongodb.com/manual/reference/method/ObjectId/>.
[CUID]
Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, , <https://github.com/ericelliott/cuid>.

Authors' Addresses

Brad G. Peabody
Kyzer R. Davis
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-01.txt ================================================ dispatch BGP. Peabody Internet-Draft Updates: 4122 (if approved) K. Davis Intended status: Standards Track 26 April 2021 Expires: 28 October 2021 New UUID Formats draft-peabody-dispatch-new-uuid-format-01 Abstract This document presents new time-based UUID formats which are suited for use as a database key. A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update [RFC4122] with three new UUID versions that address these concerns, each with different trade-offs. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 28 October 2021. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. Peabody & Davis Expires 28 October 2021 [Page 1] Internet-Draft new-uuid-format April 2021 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 5 4. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Versions . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Variant . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3. UUIDv6 Layout and Bit Order . . . . . . . . . . . . . . . 7 4.3.1. UUIDv6 Timestamp Usage . . . . . . . . . . . . . . . 8 4.3.2. UUIDv6 Clock Sequence Usage . . . . . . . . . . . . . 8 4.3.3. UUIDv6 Node Usage . . . . . . . . . . . . . . . . . . 8 4.3.4. UUIDv6 Basic Creation Algorithm . . . . . . . . . . . 8 4.4. UUIDv7 Layout and Bit Order . . . . . . . . . . . . . . . 10 4.4.1. UUIDv7 Timestamp Usage . . . . . . . . . . . . . . . 11 4.4.2. UUIDv7 Clock Sequence Usage . . . . . . . . . . . . . 11 4.4.3. UUIDv7 Node Usage . . . . . . . . . . . . . . . . . . 11 4.4.4. UUIDv7 Encoding and Decoding . . . . . . . . . . . . 11 4.5. UUIDv8 Layout and Bit Order . . . . . . . . . . . . . . . 16 4.5.1. UUIDv8 Timestamp Usage . . . . . . . . . . . . . . . 18 4.5.2. UUIDv8 Clock Sequence Usage . . . . . . . . . . . . . 20 4.5.3. UUIDv8 Node Usage . . . . . . . . . . . . . . . . . . 20 4.5.4. UUIDv6 Basic Creation Algorithm . . . . . . . . . . . 21 5. Encoding and Storage . . . . . . . . . . . . . . . . . . . . 24 6. Global Uniqueness . . . . . . . . . . . . . . . . . . . . . . 24 7. Distributed UUID Generation . . . . . . . . . . . . . . . . . 24 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 11. Normative References . . . . . . . . . . . . . . . . . . . . 25 12. Informative References . . . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Peabody & Davis Expires 28 October 2021 [Page 2] Internet-Draft new-uuid-format April 2021 2. Background A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. However some properties of [RFC4122] UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this. While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re- ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal. Peabody & Davis Expires 28 October 2021 [Page 3] Internet-Draft new-uuid-format April 2021 Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application. Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability. Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time- based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing. 1. [LexicalUUID] by Twitter 2. [Snowflake] by Twitter 3. [Flake] by Boundary 4. [ShardingID] by Instagram 5. [KSUID] by Segment 6. [Elasticflake] by P. Pearcy 7. [FlakeID] by T. Pawlak 8. [Sonyflake] by Sony 9. [orderedUuid] by IT. Cabrera 10. [COMBGUID] by R. Tallent 11. [ULID] by A. Feerasta 12. [SID] by A. Chilton 13. [pushID] by Google 14. [XID] by O. Poitrey 15. [ObjectID] by MongoDB 16. [CUID] by E. Elliott Peabody & Davis Expires 28 October 2021 [Page 4] Internet-Draft new-uuid-format April 2021 An inspection of these implementations details the following trends that help define this standard: - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms. - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering. - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements. - The ID format SHOULD be Lexicographically sortable while in the textual representation. - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp. - IDs MUST NOT require unique network identifiers as part of achieving uniqueness. - Distributed nodes MUST be able to create collision resistant Unique IDs without a consulting a centralized resource. 3. Summary of Changes In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs. The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Section 4.3 Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36-bit integer sections part, and is followed by a field of variable length which represents the sub- second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Section 4.4 Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time- based UUID while retaining the properties that make UUID great. See Section 4.5 Peabody & Davis Expires 28 October 2021 [Page 5] Internet-Draft new-uuid-format April 2021 4. Format The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout. 4.1. Versions Table 1 defines the 4-bit version found in Bits 48 through 51 within a given UUID. +------+------+------+------+---------+-----------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+-----------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian | | | | | | | time-based UUID | +------+------+------+------+---------+-----------------------+ | 0 | 1 | 1 | 1 | 7 | Variable length Unix | | | | | | | Epoch time-based UUID | +------+------+------+------+---------+-----------------------+ | 1 | 0 | 0 | 0 | 8 | Custom time-based | | | | | | | UUID | +------+------+------+------+---------+-----------------------+ Table 1: UUID versions defined by this specification 4.2. Variant The variant bits utilized by UUIDs in this specification remains the same as [RFC4122], Section 4.1.1. The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation. +------+------+------+-----------------------------------------+ | Msb0 | Msb1 | Msb2 | Description | +------+------+------+-----------------------------------------+ | 1 | 0 | x | The variant specified in this document. | +------+------+------+-----------------------------------------+ Table 2: UUID Variant defined by this specification Peabody & Davis Expires 28 October 2021 [Page 6] Internet-Draft new-uuid-format April 2021 4.3. UUIDv6 Layout and Bit Order UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60-bit Gregorian timestamp source. The clock sequence bits remain unchanged from their usage and position in [RFC4122]. The 48-bit node MUST be set to a pseudo-random value. The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: UUIDv6 Field and Bit Layout time_high: The most significant 32 bits of the 60-bit starting timestamp. Occupies bits 0 through 31 (octets 0-3) time_mid: The middle 16 bits of the 60-bit starting timestamp. Occupies bits 32 through 47 (octets 4-5) time_low_and_version: The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60-bit starting timestamp. Occupies bits 48 through 63 (octets 6-7) clk_seq_hi_res: The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8) Peabody & Davis Expires 28 October 2021 [Page 7] Internet-Draft new-uuid-format April 2021 clock_seq_low: The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9) node: 48-bit pseudo-random number used as a spatially unique identifier Occupies bits 80 through 127 (octets 10-15) 4.3.1. UUIDv6 Timestamp Usage UUIDv6 reuses the 60-bit Gregorian timestamp with 100-nanosecond precision defined in [RFC4122], Section 4.1.4. 4.3.2. UUIDv6 Clock Sequence Usage UUIDv6 makes no change to the Clock Sequence usage defined by [RFC4122], Section 4.1.5. 4.3.3. UUIDv6 Node Usage UUIDv6 node bits SHOULD be set to a 48-bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address. 4.3.4. UUIDv6 Basic Creation Algorithm The following implementation algorithm is based on [RFC4122] but with changes specific to UUIDv6: 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. 3. Set the time_low field to the 12 least significant bits of the starting 60-bit timestamp. 4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid. 5. Set the time_high field to the 32 most significant bits of the truncated timestamp. 6. Set the time_mid field to the 16 least significant bits of the truncated timestamp. Peabody & Davis Expires 28 October 2021 [Page 8] Internet-Draft new-uuid-format April 2021 7. Create the 16-bit time_low_and_version by concatenating the 4-bit UUIDv6 version with the 12-bit time_low. 8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14-bit clock sequence value. 9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value. 10. Complete the 16-bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16-bit value. 11. Generate a 48-bit psuedo-random node. 12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node 13. Save the state (current timestamp and clock sequence) back to the stable store The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48-bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications. +--------------+------+--------------+ | UUIDv1 Field | Bits | UUIDv6 Field | +--------------+------+--------------+ | time_low | 32 | time_high | +--------------+------+--------------+ | time_mid | 16 | time_mid | +--------------+------+--------------+ | time_high | 12 | time_low | +--------------+------+--------------+ Table 3: UUIDv1 to UUIDv6 Field Mappings Peabody & Davis Expires 28 October 2021 [Page 9] Internet-Draft new-uuid-format April 2021 4.4. UUIDv7 Layout and Bit Order The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-section precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly. The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 2 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | subsec_a | ver | subsec_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: UUIDv7 Field and Bit Layout unixts: 36-bit big-endian unsigned Unix Timestamp value subsec_a: 12-bits allocated to sub-section precision values. ver: The 4 bit UUIDv8 version (0111) subsec_b: 12-bits allocated to sub-section precision values. var: 2-bit UUID variant (10) subsec_seq_node: The remaining 62 bits which MAY be allocated to any combination of additional sub-section precision, sequence counter, or pseudo- random data. Peabody & Davis Expires 28 October 2021 [Page 10] Internet-Draft new-uuid-format April 2021 4.4.1. UUIDv7 Timestamp Usage UUIDv7 utilizes a 36-bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long). Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout. 4.4.2. UUIDv7 Clock Sequence Usage UUIDv7 SHOULD utilize a motonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits. The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates. 4.4.3. UUIDv7 Node Usage UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data. 4.4.4. UUIDv7 Encoding and Decoding The UUIDv7 bit layout for encoding and decoding are described separately in this document. Peabody & Davis Expires 28 October 2021 [Page 11] Internet-Draft new-uuid-format April 2021 4.4.4.1. UUIDv7 Encoding Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits. Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples. All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub- second encoding). In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits. Examining Figure 3 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) * All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to a motonic clock sequence counter (seq). * The 2 Variant bits remain unchanged (var). * Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). Peabody & Davis Expires 28 October 2021 [Page 12] Internet-Draft new-uuid-format April 2021 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | msec | ver | seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision) In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits. Examining Figure 4 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) * All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Microsecond precision (usec). * The 2 Variant bits remain unchanged (var). * A 14 bit motonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node * Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). Peabody & Davis Expires 28 October 2021 [Page 13] Internet-Draft new-uuid-format April 2021 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | usec | ver | usec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision) In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits. Examining Figure 5 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) * All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Nanosecond precision (nsec). * The 2 Variant bits remain unchanged (var). * The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec). * The next 8 bits of subsec_seq_node dedicated a motonic clock sequence counter (seq). * Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). Peabody & Davis Expires 28 October 2021 [Page 14] Internet-Draft new-uuid-format April 2021 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | nsec | ver | nsec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| nsec | seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision) 4.4.4.2. UUIDv7 Decoding When decoding or parsing a UUIDv7 value there are only two values to be considered: 1. The unix timestamp defined as unixts 2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout. Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence. To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example: 1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th). Peabody & Davis Expires 28 October 2021 [Page 15] Internet-Draft new-uuid-format April 2021 2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282 This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems. Take for example the starting point for our next two UUIDv7 parsing scenarios: 1. System A produces a UUIDv7 with a microsecond-precise timestamp value. 2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A. Scenario 1: 1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder) 2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds). Scenario 2: 1. System B parses the timestamp with nanosecond precision. (More precision than the encoder) 2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7. 4.5. UUIDv8 Layout and Bit Order UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs. UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8. Some situations in which UUIDv8 usage could occur: Peabody & Davis Expires 28 October 2021 [Page 16] Internet-Draft new-uuid-format April 2021 * An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs. * An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs. * An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs. * An implementation would like to embed extra information within the UUID node other than what is defined in this document. * An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs. Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128-bits. - Timestamp Bits (Variable Length) - Clock Sequence Bits (Variable Length) - Node Bits (Variable Length) - UUIDv8 Version Bits (4 bits) - UUID Variant Bits (2 Bits) The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data. A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128-bit UUIDv8. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_48 | ver | time_or_seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq_or_node | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Peabody & Davis Expires 28 October 2021 [Page 17] Internet-Draft new-uuid-format April 2021 Figure 6: UUIDv8 Field and Bit Layout timestamp_32: The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3). timestamp_48: The next 16-bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32-bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47 ver: The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51. time_or_seq: If a 60-bit, or larger, timestamp is used these 12-bits are used to fill out the remaining timestamp. If a 32 or 48-bit timestamp is leveraged a 12-bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7) var: 2-bit UUID variant (10) seq_or_node: If a 60-bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8-bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8-bits SHOULD be set to random. node: In most implementations these bits will likely be set to pseudo- random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15) 4.5.1. UUIDv8 Timestamp Usage UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8. Some examples include: - Custom Epoch - NTP Timestamp - ISO 8601 timestamp Peabody & Davis Expires 28 October 2021 [Page 18] Internet-Draft new-uuid-format April 2021 The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined. Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths: * 32-bit timestamp: using timestamp_32 and setting timestamp_48 to 0s * 48-bit timestamp: using timestamp_32 and timestamp_48 entirely * 60-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq * 64-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits. Although it is possible to create a timestamp larger than 64-bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location. Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36-bit timestamp source would fully utilize timestamp_32 and 4-bits of timestamp_48. The remaining 12-bits in timestamp_48 MUST be set to 0. By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre- existing knowledge of the source timestamp used by the UUIDv8 implementation. Peabody & Davis Expires 28 October 2021 [Page 19] Internet-Draft new-uuid-format April 2021 4.5.2. UUIDv8 Clock Sequence Usage A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged: * 12-bit clock sequence using time_or_seq for use when the timestamp is less than 48-bits which allows for 4095 UUIDs per clock tick. * 8-bit clock sequence using seq_or_node when the timestamp uses more than 48-bits which allows for 255 UUIDs per clock tick. An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20-bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8-bits is not sufficient. The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates. 4.5.3. UUIDv8 Node Usage The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data. The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size: * 62-bit node encompassing seq_or_node and node Used when a timestamp of 48-bits or less is leveraged. * 54-bit node when all 60-bits of the timestamp are in use and the seq_or_node is used as clock sequencing. Peabody & Davis Expires 28 October 2021 [Page 20] Internet-Draft new-uuid-format April 2021 An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48-bits to ensure global uniqueness can be guaranteed. 4.5.4. UUIDv6 Basic Creation Algorithm The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications. The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification. *32-bit timestamp, 12-bit sequence counter, 62-bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as 32 bits. 3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp 4. Set 16-bit timestamp_48 to all 0s 5. Set the version to 8 (1000) 6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (time_or_node) to 0 7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_node). 8. Set the variant to binary 10 9. Generate 62 random bits and fill in 8-bits for seq_or_node and 54-bits for the node. 10. Format by concatenating the 128-bits as: timestamp_32|timestamp_ 48|version|time_or_node|variant|seq_or_node|node Peabody & Davis Expires 28 October 2021 [Page 21] Internet-Draft new-uuid-format April 2021 11. Save the state (current timestamp and clock sequence) back to the stable store *48-bit timestamp, 12-bit sequence counter, 62-bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as 32 bits. 3. Set the 32-bit field timestamp_32 to the 32 most significant bits from the timestamp 4. Set 16-bit timestamp_48 to the 16 least significant bits from the timestamp 5. The rest of the steps are the same as the previous example. *60-bit timestamp, 8-bit sequence counter, 54-bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as 32 bits. 3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp 4. Set 16-bit timestamp_48 to the 16 middle bits from the timestamp 5. Set the version to 8 (1000) 6. Set 12-bit time_or_node to the 12 least significant bits from the timestamp 7. Set the variant to 10 8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (seq_or_node) to 0 9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node). Peabody & Davis Expires 28 October 2021 [Page 22] Internet-Draft new-uuid-format April 2021 10. Generate 54 random bits and fill in the node 11. Format by concatenating the 128-bits as: timestamp_32|timestamp_ 48|version|time_or_node|variant|seq_or_node|node 12. Save the state (current timestamp and clock sequence) back to the stable store *64-bit timestamp, 8-bit sequence counter, 54-bit node:* 1. The same steps as the 60-bit timestamp can be utilized if the 64-bit timestamp is truncated to 60-bits. 2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60-bits and lose 4 bits of precision in the nanoseconds or microseconds position. *General algorithm for generation of UUIDv8 not defined here:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as desired bit total 3. Set total amount of bits for timestamp as required in the most significant positions of the 128-bit UUID 4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions. UUID Version: Bits 48 through 51 UUID Variant: Bits 64 and 65 5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0 6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value. 7. Set the remaining bits to the node as pseudo-random data 8. Format by concatenating the 128-bits together Peabody & Davis Expires 28 October 2021 [Page 23] Internet-Draft new-uuid-format April 2021 9. Save the state (current timestamp and clock sequence) back to the stable store 5. Encoding and Storage The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability. For many applications such as databases this format is unnecessarily verbose totaling 288 bits. * 8-bits for each of the 32 hex characters = 256 bits * 8-bits for each of the 4 hyphens = 32 bits Where possible UUIDs SHOULD be stored within database applications as the underlying 128-bit binary value. 6. Global Uniqueness UUIDs created by this specification offer the same guarantees for global uniqueness as those found in [RFC4122]. Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use- cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain. 7. Distributed UUID Generation Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance. This machine ID MUST be placed in the UUID proceeding the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification. Peabody & Davis Expires 28 October 2021 [Page 24] Internet-Draft new-uuid-format April 2021 8. IANA Considerations This document has no IANA actions. 9. Security Considerations MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized. The machineID portion of node, described in Section 7, does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole. 10. Acknowledgements The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document. 11. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . Peabody & Davis Expires 28 October 2021 [Page 25] Internet-Draft new-uuid-format April 2021 12. Informative References [LexicalUUID] Twitter, "A Scala client for Cassandra", commit f6da4e0, November 2012, . [Snowflake] Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, May 2014, . [Flake] Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, February 2017, . [ShardingID] Instagram Engineering, "Sharding & IDs at Instagram", December 2012, . [KSUID] Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, July 2020, . [Elasticflake] Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, January 2015, . [FlakeID] Pawlak, T., "Flake ID Generator", Commit fcd6a2f, April 2020, . [Sonyflake] Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, August 2020, . [orderedUuid] Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", January 2020, . [COMBGUID] Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, December 2020, . Peabody & Davis Expires 28 October 2021 [Page 26] Internet-Draft new-uuid-format April 2021 [ULID] Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, May 2019, . [SID] Chilton, A., "sid : generate sortable identifiers", Commit 660e947, June 2019, . [pushID] Google, "The 2^120 Ways to Ensure Unique Identifiers", February 2015, . [XID] Poitrey, O., "Globally Unique ID Generator", Commit efa678f, October 2020, . [ObjectID] MongoDB, "ObjectId - MongoDB Manual", . [CUID] Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, October 2020, . Authors' Addresses Brad G. Peabody Email: brad@peabody.io Kyzer R. Davis Email: kydavis@cisco.com Peabody & Davis Expires 28 October 2021 [Page 27] ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-01.xml ================================================ New UUID Formats
brad@peabody.io
kydavis@cisco.com
ART dispatch uuid This document presents new time-based UUID formats which are suited for use as a database key. A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update with three new UUID versions that address these concerns, each with different trade-offs.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .
A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. However some properties of UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this. While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re-ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal. Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application. Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability. Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.
  1. by Twitter
  2. by Twitter
  3. by Boundary
  4. by Instagram
  5. by Segment
  6. by P. Pearcy
  7. by T. Pawlak
  8. by Sony
  9. by IT. Cabrera
  10. by R. Tallent
  11. by A. Feerasta
  12. by A. Chilton
  13. by Google
  14. by O. Poitrey
  15. by MongoDB
  16. by E. Elliott
An inspection of these implementations details the following trends that help define this standard:
  • - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.
  • - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.
  • - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.
  • - The ID format SHOULD be Lexicographically sortable while in the textual representation.
  • - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.
  • - IDs MUST NOT require unique network identifiers as part of achieving uniqueness.
  • - Distributed nodes MUST be able to create collision resistant Unique IDs without a consulting a centralized resource.
In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs. The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36-bit integer sections part, and is followed by a field of variable length which represents the sub-second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time-based UUID while retaining the properties that make UUID great. See
The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout.
Table 1 defines the 4-bit version found in Bits 48 through 51 within a given UUID. UUID versions defined by this specification
Msb0Msb1Msb2Msb3VersionDescription
01106Reordered Gregorian time-based UUID
01117Variable length Unix Epoch time-based UUID
10008Custom time-based UUID
The variant bits utilized by UUIDs in this specification remains the same as . The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation. UUID Variant defined by this specification
Msb0Msb1Msb2Description
10xThe variant specified in this document.
UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60-bit Gregorian timestamp source. The clock sequence bits remain unchanged from their usage and position in . The 48-bit node MUST be set to a pseudo-random value. The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 1
UUIDv6 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
time_high:
The most significant 32 bits of the 60-bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60-bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60-bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48-bit pseudo-random number used as a spatially unique identifier Occupies bits 80 through 127 (octets 10-15)
UUIDv6 reuses the 60-bit Gregorian timestamp with 100-nanosecond precision defined in .
UUIDv6 makes no change to the Clock Sequence usage defined by .
UUIDv6 node bits SHOULD be set to a 48-bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the method of generating a random multicast IEEE 802 MAC address.
The following implementation algorithm is based on but with changes specific to UUIDv6:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
  3. Set the time_low field to the 12 least significant bits of the starting 60-bit timestamp.
  4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid.
  5. Set the time_high field to the 32 most significant bits of the truncated timestamp.
  6. Set the time_mid field to the 16 least significant bits of the truncated timestamp.
  7. Create the 16-bit time_low_and_version by concatenating the 4-bit UUIDv6 version with the 12-bit time_low.
  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14-bit clock sequence value.
  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.
  10. Complete the 16-bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16-bit value.
  11. Generate a 48-bit psuedo-random node.
  12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node
  13. Save the state (current timestamp and clock sequence) back to the stable store
The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48-bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications. UUIDv1 to UUIDv6 Field Mappings
UUIDv1 FieldBitsUUIDv6 Field
time_low 32time_high
time_mid 16time_mid
time_high12time_low
The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-section precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly. The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 2
UUIDv7 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | subsec_a | ver | subsec_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
unixts:
36-bit big-endian unsigned Unix Timestamp value
subsec_a:
12-bits allocated to sub-section precision values.
ver:
The 4 bit UUIDv8 version (0111)
subsec_b:
12-bits allocated to sub-section precision values.
var:
2-bit UUID variant (10)
subsec_seq_node:
The remaining 62 bits which MAY be allocated to any combination of additional sub-section precision, sequence counter, or pseudo-random data.
UUIDv7 utilizes a 36-bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long). Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout.
UUIDv7 SHOULD utilize a motonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits. The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates.
UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data.
The UUIDv7 bit layout for encoding and decoding are described separately in this document.
Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits. Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples. All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub-second encoding). In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits. Examining Figure 3 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to a motonic clock sequence counter (seq).
  • The 2 Variant bits remain unchanged (var).
  • Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | msec | ver | seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits. Examining Figure 4 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Microsecond precision (usec).
  • The 2 Variant bits remain unchanged (var).
  • A 14 bit motonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node
  • Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | usec | ver | usec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits. Examining Figure 5 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The 2 Variant bits remain unchanged (var).
  • The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The next 8 bits of subsec_seq_node dedicated a motonic clock sequence counter (seq).
  • Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | nsec | ver | nsec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| nsec | seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When decoding or parsing a UUIDv7 value there are only two values to be considered:
  1. The unix timestamp defined as unixts
  2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node
As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout. Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence. To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example:
  1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th).
  2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282
This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems. Take for example the starting point for our next two UUIDv7 parsing scenarios:
  1. System A produces a UUIDv7 with a microsecond-precise timestamp value.
  2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A.
Scenario 1:
  1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder)
  2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds).
Scenario 2:
  1. System B parses the timestamp with nanosecond precision. (More precision than the encoder)
  2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7.
UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs. UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8. Some situations in which UUIDv8 usage could occur:
  • An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs.
  • An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs.
  • An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs.
  • An implementation would like to embed extra information within the UUID node other than what is defined in this document.
  • An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs.
Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128-bits.
  • - Timestamp Bits (Variable Length)
  • - Clock Sequence Bits (Variable Length)
  • - Node Bits (Variable Length)
  • - UUIDv8 Version Bits (4 bits)
  • - UUID Variant Bits (2 Bits)
The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data. A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128-bit UUIDv8.
UUIDv8 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_48 | ver | time_or_seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq_or_node | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
timestamp_32:
The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3).
timestamp_48:
The next 16-bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32-bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47
ver:
The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51.
time_or_seq:
If a 60-bit, or larger, timestamp is used these 12-bits are used to fill out the remaining timestamp. If a 32 or 48-bit timestamp is leveraged a 12-bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7)
var:
2-bit UUID variant (10)
seq_or_node:
If a 60-bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8-bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8-bits SHOULD be set to random.
node:
In most implementations these bits will likely be set to pseudo-random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15)
UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8. Some examples include:
  • - Custom Epoch
  • - NTP Timestamp
  • - ISO 8601 timestamp
The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined. Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths:
  • 32-bit timestamp: using timestamp_32 and setting timestamp_48 to 0s
  • 48-bit timestamp: using timestamp_32 and timestamp_48 entirely
  • 60-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq
  • 64-bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits.
Although it is possible to create a timestamp larger than 64-bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location. Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36-bit timestamp source would fully utilize timestamp_32 and 4-bits of timestamp_48. The remaining 12-bits in timestamp_48 MUST be set to 0. By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre-existing knowledge of the source timestamp used by the UUIDv8 implementation.
A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged:
  • 12-bit clock sequence using time_or_seq for use when the timestamp is less than 48-bits which allows for 4095 UUIDs per clock tick.
  • 8-bit clock sequence using seq_or_node when the timestamp uses more than 48-bits which allows for 255 UUIDs per clock tick.
An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20-bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8-bits is not sufficient. The clock sequence MUST start at zero and increment monotonically for each new UUID created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUID generation rates.
The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data. The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size:
  • 62-bit node encompassing seq_or_node and node Used when a timestamp of 48-bits or less is leveraged.
  • 54-bit node when all 60-bits of the timestamp are in use and the seq_or_node is used as clock sequencing.
An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48-bits to ensure global uniqueness can be guaranteed.
The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications. The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification. 32-bit timestamp, 12-bit sequence counter, 62-bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp
  4. Set 16-bit timestamp_48 to all 0s
  5. Set the version to 8 (1000)
  6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (time_or_node) to 0
  7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_node).
  8. Set the variant to binary 10
  9. Generate 62 random bits and fill in 8-bits for seq_or_node and 54-bits for the node.
  10. Format by concatenating the 128-bits as: timestamp_32|timestamp_48|version|time_or_node|variant|seq_or_node|node
  11. Save the state (current timestamp and clock sequence) back to the stable store
48-bit timestamp, 12-bit sequence counter, 62-bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32-bit field timestamp_32 to the 32 most significant bits from the timestamp
  4. Set 16-bit timestamp_48 to the 16 least significant bits from the timestamp
  5. The rest of the steps are the same as the previous example.
60-bit timestamp, 8-bit sequence counter, 54-bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32-bit field timestamp_32 to the 32 bits from the timestamp
  4. Set 16-bit timestamp_48 to the 16 middle bits from the timestamp
  5. Set the version to 8 (1000)
  6. Set 12-bit time_or_node to the 12 least significant bits from the timestamp
  7. Set the variant to 10
  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (seq_or_node) to 0
  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node).
  10. Generate 54 random bits and fill in the node
  11. Format by concatenating the 128-bits as: timestamp_32|timestamp_48|version|time_or_node|variant|seq_or_node|node
  12. Save the state (current timestamp and clock sequence) back to the stable store
64-bit timestamp, 8-bit sequence counter, 54-bit node:
  1. The same steps as the 60-bit timestamp can be utilized if the 64-bit timestamp is truncated to 60-bits.
  2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60-bits and lose 4 bits of precision in the nanoseconds or microseconds position.
General algorithm for generation of UUIDv8 not defined here:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as desired bit total
  3. Set total amount of bits for timestamp as required in the most significant positions of the 128-bit UUID
  4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions. UUID Version: Bits 48 through 51 UUID Variant: Bits 64 and 65
  5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0
  6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.
  7. Set the remaining bits to the node as pseudo-random data
  8. Format by concatenating the 128-bits together
  9. Save the state (current timestamp and clock sequence) back to the stable store
The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability. For many applications such as databases this format is unnecessarily verbose totaling 288 bits.
  • 8-bits for each of the 32 hex characters = 256 bits
  • 8-bits for each of the 4 hyphens = 32 bits
Where possible UUIDs SHOULD be stored within database applications as the underlying 128-bit binary value.
UUIDs created by this specification offer the same guarantees for global uniqueness as those found in . Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use-cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain.
Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance. This machine ID MUST be placed in the UUID proceeding the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification.
This document has no IANA actions.
MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then UUIDv4 SHOULD be utilized. The machineID portion of node, described in , does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole.
The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document.
Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. A Universally Unique IDentifier (UUID) URN Namespace This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation\'s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms. This specification is derived from the DCE specification with the kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been incorporated into this document. [STANDARDS-TRACK] A Scala client for Cassandra Twitter Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees. Twitter Flake: A decentralized, k-ordered id generation service in Erlang Boundary Sharding & IDs at Instagram Instagram Engineering K-Sortable Globally Unique IDs Segment Sequential UUID / Flake ID generator pulled out of elasticsearch common Flake ID Generator A distributed unique ID generator inspired by Twitter's Snowflake Sony Laravel: The mysterious "Ordered UUID" Creating sequential GUIDs in C# for MSSQL or PostgreSql Universally Unique Lexicographically Sortable Identifier sid : generate sortable identifiers The 2^120 Ways to Ensure Unique Identifiers Google Globally Unique ID Generator ObjectId - MongoDB Manual MongoDB Collision-resistant ids optimized for horizontal scaling and performance.
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-02.html ================================================ New UUID Formats
Internet-Draft new-uuid-format August 2021
Peabody & Davis Expires 27 February 2022 [Page]
Workgroup:
dispatch
Internet-Draft:
draft-peabody-dispatch-new-uuid-format-01
Updates:
4122 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
BGP. Peabody
K. Davis

New UUID Formats

Abstract

This document presents new time-based UUID formats which are suited for use as a database key.

A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update [RFC4122] with three new UUID versions that address these concerns, each with different trade-offs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 27 February 2022.

1. Introduction

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Background

A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys.

The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments.

However some properties of [RFC4122] UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this.

While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re-ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such.

After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal.

Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application.

Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability.

Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.

While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.

  1. [LexicalUUID] by Twitter

  2. [Snowflake] by Twitter

  3. [Flake] by Boundary

  4. [ShardingID] by Instagram

  5. [KSUID] by Segment

  6. [Elasticflake] by P. Pearcy

  7. [FlakeID] by T. Pawlak

  8. [Sonyflake] by Sony

  9. [orderedUuid] by IT. Cabrera

  10. [COMBGUID] by R. Tallent

  11. [ULID] by A. Feerasta

  12. [SID] by A. Chilton

  13. [pushID] by Google

  14. [XID] by O. Poitrey

  15. [ObjectID] by MongoDB

  16. [CUID] by E. Elliott

An inspection of these implementations details the following trends that help define this standard:

  • - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.

  • - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.

  • - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.

  • - The ID format SHOULD be Lexicographically sortable while in the textual representation.

  • - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.

  • - IDs MUST NOT require unique network identifiers as part of achieving uniqueness.

  • - Distributed nodes MUST be able to create collision resistant Unique IDs without consulting a centralized resource.

3. Summary of Changes

In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs.

The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Section 4.3

Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36 bit integer sections part, and is followed by a field of variable length which represents the sub-second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Section 4.4

Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time-based UUID while retaining the properties that make UUID great. See Section 4.5

3.1. changelog

RFC EDITOR PLEASE DELETE THIS SECTION.

draft-02

  • - Added Changelog

  • - Fixed misc. grammatical errors

  • - Fixed section numbering issue

  • - Fixed some UUIDvX reference issues

  • - Changed all instances of "motonic" to "monotonic"

  • - Changed all instances of "#-bit" to "# bit"

  • - Changed "proceeding" veriage to "after" in section 7

  • - Added details on how to pad 32 bit unix timestamp to 36 bits in UUIDv7

  • - Added details on how to truncate 64 bit unix timestamp to 36 bits in UUIDv7

  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.

  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4

draft-01

  • - Complete rewrite of entire document.

  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.

  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.

  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)

  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)

4. Format

The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout.

4.1. Versions

Table 1 defines the 4 bit version found in Bits 48 through 51 within a given UUID.

Table 1: UUID versions defined by this specification
Msb0 Msb1 Msb2 Msb3 Version Description
0 1 1 0 6 Reordered Gregorian time-based UUID
0 1 1 1 7 Variable length Unix Epoch time-based UUID
1 0 0 0 8 Custom time-based UUID

4.2. Variant

The variant bits utilized by UUIDs in this specification remains the same as [RFC4122], Section 4.1.1.

The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation.

Table 2: UUID Variant defined by this specification
Msb0 Msb1 Msb2 Description
1 0 x The variant specified in this document.

4.3. UUIDv6 Layout and Bit Order

UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60 bit Gregorian timestamp source with 100-nanosecond precision defined in [RFC4122], Section 4.1.4 The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5. The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5

The format for the 16-octet, 128 bit UUIDv6 is shown in Figure 1

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           time_high                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           time_mid            |      time_low_and_version     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         node (2-5)                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: UUIDv6 Field and Bit Layout
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)

4.3.1. UUIDv6 Basic Creation Algorithm

The following implementation algorithm is based on [RFC4122] but with changes specific to UUIDv6:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time as a 60 bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

  3. Set the time_low field to the 12 least significant bits of the starting 60 bit timestamp.

  4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid.

  5. Set the time_high field to the 32 most significant bits of the truncated timestamp.

  6. Set the time_mid field to the 16 least significant bits of the truncated timestamp.

  7. Create the 16 bit time_low_and_version by concatenating the 4 bit UUIDv6 version with the 12 bit time_low.

  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14 bit clock sequence value.

  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.

  10. Complete the 16 bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16 bit value.

  11. Generate a 48 bit pseudo-random node.

  12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node

  13. Save the state (current timestamp and clock sequence) back to the stable store

The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48 bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications.

Table 3: UUIDv1 to UUIDv6 Field Mappings
UUIDv1 Field Bits UUIDv6 Field
time_low 32 time_high
time_mid 16 time_mid
time_high 12 time_low

4.4. UUIDv7 Layout and Bit Order

The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-second precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly.

The format for the 16-octet, 128 bit UUIDv7 is shown in Figure 2

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |       subsec_a        |  ver  |       subsec_b        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|                   subsec_seq_node                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       subsec_seq_node                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: UUIDv7 Field and Bit Layout
unixts:
36 bit big-endian unsigned Unix Timestamp value
subsec_a:
12 bits allocated to sub-second precision values.
ver:
The 4 bit UUIDv7 version (0111)
subsec_b:
12 bits allocated to sub-second precision values.
var:
2 bit UUID variant (10)
subsec_seq_node:
The remaining 62 bits which MAY be allocated to any combination of additional sub-second precision, sequence counter, or pseudo-random data.

4.4.1. UUIDv7 Timestamp Usage

UUIDv7 utilizes a 36 bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long). The 36 bit value was selected in order to provide more available time to the unix timestamp and avoid the Year 2038 problem by extending the maximum timestamp to the year 4147.

To achieve a 36 bit UUIDv7 timestamp, the lower 36 bits of a 64 bit unix time are extracted verbatim into UUIDv7

In the event that 32 bit Unix Timestamp are in use; four zeros MUST be appended at the start in the most significant (left-most) bits of the 32 bit Unix timestamp creating the 36 bit Unix timestamp. This ensures sorting compatibility with 64 bit unix timestamp which have been truncated to 36 bits.

Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout.

UUIDv8 SHOULD be used in place of UUIDv7 if an application or implementation does not want to truncate a 64 bit Unix Epoch to the lower 36 bits.

4.4.2. UUIDv7 Clock Sequence Usage

UUIDv7 SHOULD utilize a monotonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits.

The clock sequence MUST start at zero and increment monotonically for each new UUIDv7 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv7 generation rates.

4.4.3. UUIDv7 Node Usage

UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data.

4.4.4. UUIDv7 Encoding and Decoding

The UUIDv7 bit layout for encoding and decoding are described separately in this document.

4.4.4.1. UUIDv7 Encoding

Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits.

Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples.

All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub-second encoding).

In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits.

Examining Figure 3 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to a monotonic clock sequence counter (seq).

  • The 2 Variant bits remain unchanged (var).

  • Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         msec          |  ver  |          seq          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|                         rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision)

In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits.

Examining Figure 4 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Microsecond precision (usec).

  • The 2 Variant bits remain unchanged (var).

  • A 14 bit monotonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node

  • Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         usec          |  ver  |         usec          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|             seq           |            rand               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision)

In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits.

Examining Figure 5 one can observe:

  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)

  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The 4 Version bits remain unchanged (ver).

  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The 2 Variant bits remain unchanged (var).

  • The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

  • The next 8 bits of subsec_seq_node dedicated a monotonic clock sequence counter (seq).

  • Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            unixts                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |unixts |         nsec          |  ver  |         nsec          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|             nsec          |      seq      |     rand      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             rand                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision)
4.4.4.2. UUIDv7 Decoding

When decoding or parsing a UUIDv7 value there are only two values to be considered:

  1. The unix timestamp defined as unixts

  2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node

As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout.

Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence.

To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example:

  1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th).

  2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282

This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems.

Take for example the starting point for our next two UUIDv7 parsing scenarios:

  1. System A produces a UUIDv7 with a microsecond-precise timestamp value.
  2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A.

Scenario 1:

  1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder)
  2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds).

Scenario 2:

  1. System B parses the timestamp with nanosecond precision. (More precision than the encoder)
  2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7.

4.5. UUIDv8 Layout and Bit Order

UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs.

UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv7. Some situations in which UUIDv8 usage could occur:

  • An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs.

  • An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs.

  • An implementation would like to avoid truncating a 64 bit Unix to 36 bits as defined by UUIDv7.

  • An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs.

  • An implementation would like to embed extra information within the UUID node other than what is defined in this document.

  • An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs.

Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128 bits.

  • - Timestamp Bits (Variable Length)

  • - Clock Sequence Bits (Variable Length)

  • - Node Bits (Variable Length)

  • - UUIDv8 Version Bits (4 bits)

  • - UUID Variant Bits (2 Bits)

The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data.

A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128 bit UUIDv8.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          timestamp_32                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           timestamp_48        |  ver  |      time_or_seq      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|  seq_or_node  |          node                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                              node                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: UUIDv8 Field and Bit Layout
timestamp_32:
The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3).
timestamp_48:
The next 16 bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32 bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47
ver:
The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51.
time_or_seq:
If a 60 bit, or larger, timestamp is used these 12 bits are used to fill out the remaining timestamp. If a 32 or 48 bit timestamp is leveraged a 12 bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7)
var:
2 bit UUID variant (10)
seq_or_node:
If a 60 bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8 bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8 bits SHOULD be set to random.
node:
In most implementations these bits will likely be set to pseudo-random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15)

4.5.1. UUIDv8 Timestamp Usage

UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8.

Some examples include:

  • - Custom Epoch

  • - NTP timestamp

  • - ISO 8601 timestamp

  • - Full, Non-truncated 64 bit Unix Epoch timestamp

The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined.

Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths:

  • 32 bit timestamp: using timestamp_32 and setting timestamp_48 to 0s

  • 48 bit timestamp: using timestamp_32 and timestamp_48 entirely

  • 60 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq

  • 64 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits.

Although it is possible to create a timestamp larger than 64 bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location.

Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36 bit timestamp source would fully utilize timestamp_32 and 4 bits of timestamp_48. The remaining 12 bits in timestamp_48 MUST be set to 0.

By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre-existing knowledge of the source timestamp used by the UUIDv8 implementation.

4.5.2. UUIDv8 Clock Sequence Usage

A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision.

The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged:

  • 12 bit clock sequence using time_or_seq for use when the timestamp is less than 48 bits which allows for 4095 UUIDs per clock tick.

  • 8 bit clock sequence using seq_or_node when the timestamp uses more than 48 bits which allows for 255 UUIDs per clock tick.

An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20 bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8 bits is not sufficient.

The clock sequence MUST start at zero and increment monotonically for each new UUIDv8 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv8 generation rates.

4.5.3. UUIDv8 Node Usage

The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data.

The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size:

  • 62 bit node encompassing seq_or_node and node Used when a timestamp of 48 bits or less is leveraged.

  • 54 bit node when all 60 bits of the timestamp are in use and the seq_or_node is used as clock sequencing.

An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48 bits to ensure global uniqueness can be guaranteed.

4.5.4. UUIDv8 Basic Creation Algorithm

The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications.

The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification.

32 bit timestamp, 12 bit sequence counter, 62 bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp

  4. Set 16 bit timestamp_48 to all 0s

  5. Set the version to 8 (1000)

  6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (time_or_seq) to 0

  7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_seq).

  8. Set the variant to binary 10

  9. Generate 62 random bits and fill in 8 bits for seq_or_node and 54 bits for the node.

  10. Format by concatenating the 128 bits as: timestamp_32|timestamp_48|version|time_or_seq|variant|seq_or_node|node

  11. Save the state (current timestamp and clock sequence) back to the stable store

48 bit timestamp, 12 bit sequence counter, 62 bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32 bit field timestamp_32 to the 32 most significant bits from the timestamp

  4. Set 16 bit timestamp_48 to the 16 least significant bits from the timestamp

  5. The rest of the steps are the same as the previous example.

60 bit timestamp, 8 bit sequence counter, 54 bit node:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as 32 bits.

  3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp

  4. Set 16 bit timestamp_48 to the 16 middle bits from the timestamp

  5. Set the version to 8 (1000)

  6. Set 12 bit time_or_seq to the 12 least significant bits from the timestamp

  7. Set the variant to 10

  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (seq_or_node) to 0

  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node).

  10. Generate 54 random bits and fill in the node

  11. Format by concatenating the 128 bits as: timestamp_32|timestamp_48|version|time_or_seq|variant|seq_or_node|node

  12. Save the state (current timestamp and clock sequence) back to the stable store

64 bit timestamp, 8 bit sequence counter, 54 bit node:

  1. The same steps as the 60 bit timestamp can be utilized if the 64 bit timestamp is truncated to 60 bits.

  2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60 bits and lose 4 bits of precision in the nanoseconds or microseconds position.

General algorithm for generation of UUIDv8 not defined here:

  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.

  2. Obtain the current time from the selected clock source as desired bit total

  3. Set total amount of bits for timestamp as required in the most significant positions of the 128 bit UUID

  4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions.

    UUID Version: Bits 48 through 51

    UUID Variant: Bits 64 and 65

  5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0

  6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.

  7. Set the remaining bits to the node as pseudo-random data

  8. Format by concatenating the 128 bits together

  9. Save the state (current timestamp and clock sequence) back to the stable store

5. Encoding and Storage

The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability.

For many applications such as databases this format is unnecessarily verbose totaling 288 bits.

  • 8 bits for each of the 32 hex characters = 256 bits

  • 8 bits for each of the 4 hyphens = 32 bits

Where possible UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.

6. Global Uniqueness

UUIDs created by this specification offer the same guarantees for global uniqueness as those found in [RFC4122]. Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use-cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain.

7. Distributed UUID Generation

Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance.

This machine ID MUST be placed in the UUID after the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification.

8. IANA Considerations

This document has no IANA actions.

9. Security Considerations

MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation.

Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.

The machineID portion of node, described in Section 7, does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole.

10. Acknowledgements

The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document.

11. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.

12. Informative References

[LexicalUUID]
Twitter, "A Scala client for Cassandra", commit f6da4e0, , <https://github.com/twitter-archive/cassie>.
[Snowflake]
Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, , <https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010>.
[Flake]
Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, , <https://github.com/boundary/flake>.
[ShardingID]
Instagram Engineering, "Sharding & IDs at Instagram", , <https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c>.
[KSUID]
Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, , <https://github.com/segmentio/ksuid>.
[Elasticflake]
Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, , <https://github.com/ppearcy/elasticflake>.
[FlakeID]
Pawlak, T., "Flake ID Generator", Commit fcd6a2f, , <https://github.com/T-PWK/flake-idgen>.
[Sonyflake]
Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, , <https://github.com/sony/sonyflake>.
[orderedUuid]
Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", , <https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8>.
[COMBGUID]
Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, , <https://github.com/richardtallent/RT.Comb>.
[ULID]
Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, , <https://github.com/ulid/spec>.
[SID]
Chilton, A., "sid : generate sortable identifiers", Commit 660e947, , <https://github.com/chilts/sid>.
[pushID]
Google, "The 2^120 Ways to Ensure Unique Identifiers", , <https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html>.
[XID]
Poitrey, O., "Globally Unique ID Generator", Commit efa678f, , <https://github.com/rs/xid>.
[ObjectID]
MongoDB, "ObjectId - MongoDB Manual", <https://docs.mongodb.com/manual/reference/method/ObjectId/>.
[CUID]
Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, , <https://github.com/ericelliott/cuid>.

Authors' Addresses

Brad G. Peabody
Kyzer R. Davis
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-02.txt ================================================ dispatch BGP. Peabody Internet-Draft Updates: 4122 (if approved) K. Davis Intended status: Standards Track 7 October 2021 Expires: 10 April 2022 New UUID Formats draft-peabody-dispatch-new-uuid-format-02 Abstract This document presents new time-based UUID formats which are suited for use as a database key. A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update [RFC4122] with three new UUID versions that address these concerns, each with different trade-offs. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 10 April 2022. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. Peabody & Davis Expires 10 April 2022 [Page 1] Internet-Draft new-uuid-format October 2021 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 5 3.1. changelog . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Versions . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Variant . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3. UUIDv6 Layout and Bit Order . . . . . . . . . . . . . . . 7 4.3.1. UUIDv6 Basic Creation Algorithm . . . . . . . . . . . 9 4.4. UUIDv7 Layout and Bit Order . . . . . . . . . . . . . . . 10 4.4.1. UUIDv7 Timestamp Usage . . . . . . . . . . . . . . . 11 4.4.2. UUIDv7 Clock Sequence Usage . . . . . . . . . . . . . 12 4.4.3. UUIDv7 Node Usage . . . . . . . . . . . . . . . . . . 12 4.4.4. UUIDv7 Encoding and Decoding . . . . . . . . . . . . 12 4.5. UUIDv8 Layout and Bit Order . . . . . . . . . . . . . . . 17 4.5.1. UUIDv8 Timestamp Usage . . . . . . . . . . . . . . . 19 4.5.2. UUIDv8 Clock Sequence Usage . . . . . . . . . . . . . 20 4.5.3. UUIDv8 Node Usage . . . . . . . . . . . . . . . . . . 21 4.5.4. UUIDv8 Basic Creation Algorithm . . . . . . . . . . . 21 5. Encoding and Storage . . . . . . . . . . . . . . . . . . . . 24 6. Global Uniqueness . . . . . . . . . . . . . . . . . . . . . . 25 7. Distributed UUID Generation . . . . . . . . . . . . . . . . . 25 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 11. Normative References . . . . . . . . . . . . . . . . . . . . 26 12. Informative References . . . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 1. Introduction The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Peabody & Davis Expires 10 April 2022 [Page 2] Internet-Draft new-uuid-format October 2021 2. Background A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. However some properties of [RFC4122] UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this. While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re- ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal. Peabody & Davis Expires 10 April 2022 [Page 3] Internet-Draft new-uuid-format October 2021 Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application. Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability. Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time- based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing. 1. [LexicalUUID] by Twitter 2. [Snowflake] by Twitter 3. [Flake] by Boundary 4. [ShardingID] by Instagram 5. [KSUID] by Segment 6. [Elasticflake] by P. Pearcy 7. [FlakeID] by T. Pawlak 8. [Sonyflake] by Sony 9. [orderedUuid] by IT. Cabrera 10. [COMBGUID] by R. Tallent 11. [ULID] by A. Feerasta 12. [SID] by A. Chilton 13. [pushID] by Google 14. [XID] by O. Poitrey 15. [ObjectID] by MongoDB 16. [CUID] by E. Elliott Peabody & Davis Expires 10 April 2022 [Page 4] Internet-Draft new-uuid-format October 2021 An inspection of these implementations details the following trends that help define this standard: - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms. - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering. - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements. - The ID format SHOULD be Lexicographically sortable while in the textual representation. - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp. - IDs MUST NOT require unique network identifiers as part of achieving uniqueness. - Distributed nodes MUST be able to create collision resistant Unique IDs without consulting a centralized resource. 3. Summary of Changes In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs. The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Section 4.3 Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36 bit integer sections part, and is followed by a field of variable length which represents the sub- second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Section 4.4 Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time- based UUID while retaining the properties that make UUID great. See Section 4.5 Peabody & Davis Expires 10 April 2022 [Page 5] Internet-Draft new-uuid-format October 2021 3.1. changelog RFC EDITOR PLEASE DELETE THIS SECTION. draft-02 - Added Changelog - Fixed misc. grammatical errors - Fixed section numbering issue - Fixed some UUIDvX reference issues - Changed all instances of "motonic" to "monotonic" - Changed all instances of "#-bit" to "# bit" - Changed "proceeding" veriage to "after" in section 7 - Added details on how to pad 32 bit unix timestamp to 36 bits in UUIDv7 - Added details on how to truncate 64 bit unix timestamp to 36 bits in UUIDv7 - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option. - Fixed bad reference to non-existent "time_or_node" in section 4.5.4 draft-01 - Complete rewrite of entire document. - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards. - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques. - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research) - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes) 4. Format The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout. Peabody & Davis Expires 10 April 2022 [Page 6] Internet-Draft new-uuid-format October 2021 4.1. Versions Table 1 defines the 4 bit version found in Bits 48 through 51 within a given UUID. +------+------+------+------+---------+-----------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+-----------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian | | | | | | | time-based UUID | +------+------+------+------+---------+-----------------------+ | 0 | 1 | 1 | 1 | 7 | Variable length Unix | | | | | | | Epoch time-based UUID | +------+------+------+------+---------+-----------------------+ | 1 | 0 | 0 | 0 | 8 | Custom time-based | | | | | | | UUID | +------+------+------+------+---------+-----------------------+ Table 1: UUID versions defined by this specification 4.2. Variant The variant bits utilized by UUIDs in this specification remains the same as [RFC4122], Section 4.1.1. The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation. +------+------+------+-----------------------------------------+ | Msb0 | Msb1 | Msb2 | Description | +------+------+------+-----------------------------------------+ | 1 | 0 | x | The variant specified in this document. | +------+------+------+-----------------------------------------+ Table 2: UUID Variant defined by this specification 4.3. UUIDv6 Layout and Bit Order UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60 bit Gregorian timestamp source with 100-nanosecond precision defined in [RFC4122], Peabody & Davis Expires 10 April 2022 [Page 7] Internet-Draft new-uuid-format October 2021 Section 4.1.4 The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5. The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5 The format for the 16-octet, 128 bit UUIDv6 is shown in Figure 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: UUIDv6 Field and Bit Layout time_high: The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3) time_mid: The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5) time_low_and_version: The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7) clk_seq_hi_res: The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8) clock_seq_low: The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9) node: 48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15) Peabody & Davis Expires 10 April 2022 [Page 8] Internet-Draft new-uuid-format October 2021 4.3.1. UUIDv6 Basic Creation Algorithm The following implementation algorithm is based on [RFC4122] but with changes specific to UUIDv6: 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time as a 60 bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. 3. Set the time_low field to the 12 least significant bits of the starting 60 bit timestamp. 4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid. 5. Set the time_high field to the 32 most significant bits of the truncated timestamp. 6. Set the time_mid field to the 16 least significant bits of the truncated timestamp. 7. Create the 16 bit time_low_and_version by concatenating the 4 bit UUIDv6 version with the 12 bit time_low. 8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14 bit clock sequence value. 9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value. 10. Complete the 16 bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16 bit value. 11. Generate a 48 bit pseudo-random node. 12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node 13. Save the state (current timestamp and clock sequence) back to the stable store Peabody & Davis Expires 10 April 2022 [Page 9] Internet-Draft new-uuid-format October 2021 The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48 bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications. +--------------+------+--------------+ | UUIDv1 Field | Bits | UUIDv6 Field | +--------------+------+--------------+ | time_low | 32 | time_high | +--------------+------+--------------+ | time_mid | 16 | time_mid | +--------------+------+--------------+ | time_high | 12 | time_low | +--------------+------+--------------+ Table 3: UUIDv1 to UUIDv6 Field Mappings 4.4. UUIDv7 Layout and Bit Order The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-second precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly. The format for the 16-octet, 128 bit UUIDv7 is shown in Figure 2 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | subsec_a | ver | subsec_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: UUIDv7 Field and Bit Layout Peabody & Davis Expires 10 April 2022 [Page 10] Internet-Draft new-uuid-format October 2021 unixts: 36 bit big-endian unsigned Unix Timestamp value subsec_a: 12 bits allocated to sub-second precision values. ver: The 4 bit UUIDv7 version (0111) subsec_b: 12 bits allocated to sub-second precision values. var: 2 bit UUID variant (10) subsec_seq_node: The remaining 62 bits which MAY be allocated to any combination of additional sub-second precision, sequence counter, or pseudo- random data. 4.4.1. UUIDv7 Timestamp Usage UUIDv7 utilizes a 36 bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long). The 36 bit value was selected in order to provide more available time to the unix timestamp and avoid the Year 2038 problem by extending the maximum timestamp to the year 4147. To achieve a 36 bit UUIDv7 timestamp, the lower 36 bits of a 64 bit unix time are extracted verbatim into UUIDv7 In the event that 32 bit Unix Timestamp are in use; four zeros MUST be appended at the start in the most significant (left-most) bits of the 32 bit Unix timestamp creating the 36 bit Unix timestamp. This ensures sorting compatibility with 64 bit unix timestamp which have been truncated to 36 bits. Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout. UUIDv8 SHOULD be used in place of UUIDv7 if an application or implementation does not want to truncate a 64 bit Unix Epoch to the lower 36 bits. Peabody & Davis Expires 10 April 2022 [Page 11] Internet-Draft new-uuid-format October 2021 4.4.2. UUIDv7 Clock Sequence Usage UUIDv7 SHOULD utilize a monotonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits. The clock sequence MUST start at zero and increment monotonically for each new UUIDv7 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv7 generation rates. 4.4.3. UUIDv7 Node Usage UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data. 4.4.4. UUIDv7 Encoding and Decoding The UUIDv7 bit layout for encoding and decoding are described separately in this document. 4.4.4.1. UUIDv7 Encoding Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits. Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples. Peabody & Davis Expires 10 April 2022 [Page 12] Internet-Draft new-uuid-format October 2021 All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub- second encoding). In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits. Examining Figure 3 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) * All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to a monotonic clock sequence counter (seq). * The 2 Variant bits remain unchanged (var). * Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | msec | ver | seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision) Peabody & Davis Expires 10 April 2022 [Page 13] Internet-Draft new-uuid-format October 2021 In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits. Examining Figure 4 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) * All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Microsecond precision (usec). * The 2 Variant bits remain unchanged (var). * A 14 bit monotonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node * Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | usec | ver | usec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision) In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits. Examining Figure 5 one can observe: * The first 36 bits have been dedicated to the Unix Timestamp (unixts) Peabody & Davis Expires 10 April 2022 [Page 14] Internet-Draft new-uuid-format October 2021 * All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec). * The 4 Version bits remain unchanged (ver). * All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Nanosecond precision (nsec). * The 2 Variant bits remain unchanged (var). * The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec). * The next 8 bits of subsec_seq_node dedicated a monotonic clock sequence counter (seq). * Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | nsec | ver | nsec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| nsec | seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision) 4.4.4.2. UUIDv7 Decoding When decoding or parsing a UUIDv7 value there are only two values to be considered: 1. The unix timestamp defined as unixts 2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout. Peabody & Davis Expires 10 April 2022 [Page 15] Internet-Draft new-uuid-format October 2021 Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence. To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example: 1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th). 2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282 This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems. Take for example the starting point for our next two UUIDv7 parsing scenarios: 1. System A produces a UUIDv7 with a microsecond-precise timestamp value. 2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A. Scenario 1: 1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder) 2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds). Scenario 2: Peabody & Davis Expires 10 April 2022 [Page 16] Internet-Draft new-uuid-format October 2021 1. System B parses the timestamp with nanosecond precision. (More precision than the encoder) 2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7. 4.5. UUIDv8 Layout and Bit Order UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs. UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv7. Some situations in which UUIDv8 usage could occur: * An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs. * An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs. * An implementation would like to avoid truncating a 64 bit Unix to 36 bits as defined by UUIDv7. * An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs. * An implementation would like to embed extra information within the UUID node other than what is defined in this document. * An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs. Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128 bits. - Timestamp Bits (Variable Length) - Clock Sequence Bits (Variable Length) - Node Bits (Variable Length) - UUIDv8 Version Bits (4 bits) - UUID Variant Bits (2 Bits) The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are Peabody & Davis Expires 10 April 2022 [Page 17] Internet-Draft new-uuid-format October 2021 filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data. A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128 bit UUIDv8. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_48 | ver | time_or_seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq_or_node | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: UUIDv8 Field and Bit Layout timestamp_32: The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3). timestamp_48: The next 16 bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32 bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47 ver: The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51. time_or_seq: If a 60 bit, or larger, timestamp is used these 12 bits are used to fill out the remaining timestamp. If a 32 or 48 bit timestamp is leveraged a 12 bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7) var: 2 bit UUID variant (10) Peabody & Davis Expires 10 April 2022 [Page 18] Internet-Draft new-uuid-format October 2021 seq_or_node: If a 60 bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8 bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8 bits SHOULD be set to random. node: In most implementations these bits will likely be set to pseudo- random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15) 4.5.1. UUIDv8 Timestamp Usage UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8. Some examples include: - Custom Epoch - NTP timestamp - ISO 8601 timestamp - Full, Non-truncated 64 bit Unix Epoch timestamp The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined. Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths: * 32 bit timestamp: using timestamp_32 and setting timestamp_48 to 0s * 48 bit timestamp: using timestamp_32 and timestamp_48 entirely * 60 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq * 64 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits. Although it is possible to create a timestamp larger than 64 bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), Peabody & Davis Expires 10 April 2022 [Page 19] Internet-Draft new-uuid-format October 2021 extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location. Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36 bit timestamp source would fully utilize timestamp_32 and 4 bits of timestamp_48. The remaining 12 bits in timestamp_48 MUST be set to 0. By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre- existing knowledge of the source timestamp used by the UUIDv8 implementation. 4.5.2. UUIDv8 Clock Sequence Usage A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged: * 12 bit clock sequence using time_or_seq for use when the timestamp is less than 48 bits which allows for 4095 UUIDs per clock tick. * 8 bit clock sequence using seq_or_node when the timestamp uses more than 48 bits which allows for 255 UUIDs per clock tick. An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20 bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8 bits is not sufficient. Peabody & Davis Expires 10 April 2022 [Page 20] Internet-Draft new-uuid-format October 2021 The clock sequence MUST start at zero and increment monotonically for each new UUIDv8 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv8 generation rates. 4.5.3. UUIDv8 Node Usage The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data. The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size: * 62 bit node encompassing seq_or_node and node Used when a timestamp of 48 bits or less is leveraged. * 54 bit node when all 60 bits of the timestamp are in use and the seq_or_node is used as clock sequencing. An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48 bits to ensure global uniqueness can be guaranteed. 4.5.4. UUIDv8 Basic Creation Algorithm The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications. The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification. *32 bit timestamp, 12 bit sequence counter, 62 bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as 32 bits. Peabody & Davis Expires 10 April 2022 [Page 21] Internet-Draft new-uuid-format October 2021 3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp 4. Set 16 bit timestamp_48 to all 0s 5. Set the version to 8 (1000) 6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (time_or_seq) to 0 7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_seq). 8. Set the variant to binary 10 9. Generate 62 random bits and fill in 8 bits for seq_or_node and 54 bits for the node. 10. Format by concatenating the 128 bits as: timestamp_32|timestamp_ 48|version|time_or_seq|variant|seq_or_node|node 11. Save the state (current timestamp and clock sequence) back to the stable store *48 bit timestamp, 12 bit sequence counter, 62 bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. 2. Obtain the current time from the selected clock source as 32 bits. 3. Set the 32 bit field timestamp_32 to the 32 most significant bits from the timestamp 4. Set 16 bit timestamp_48 to the 16 least significant bits from the timestamp 5. The rest of the steps are the same as the previous example. *60 bit timestamp, 8 bit sequence counter, 54 bit node:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. Peabody & Davis Expires 10 April 2022 [Page 22] Internet-Draft new-uuid-format October 2021 2. Obtain the current time from the selected clock source as 32 bits. 3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp 4. Set 16 bit timestamp_48 to the 16 middle bits from the timestamp 5. Set the version to 8 (1000) 6. Set 12 bit time_or_seq to the 12 least significant bits from the timestamp 7. Set the variant to 10 8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (seq_or_node) to 0 9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node). 10. Generate 54 random bits and fill in the node 11. Format by concatenating the 128 bits as: timestamp_32|timestamp_ 48|version|time_or_seq|variant|seq_or_node|node 12. Save the state (current timestamp and clock sequence) back to the stable store *64 bit timestamp, 8 bit sequence counter, 54 bit node:* 1. The same steps as the 60 bit timestamp can be utilized if the 64 bit timestamp is truncated to 60 bits. 2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60 bits and lose 4 bits of precision in the nanoseconds or microseconds position. *General algorithm for generation of UUIDv8 not defined here:* 1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID. Peabody & Davis Expires 10 April 2022 [Page 23] Internet-Draft new-uuid-format October 2021 2. Obtain the current time from the selected clock source as desired bit total 3. Set total amount of bits for timestamp as required in the most significant positions of the 128 bit UUID 4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions. UUID Version: Bits 48 through 51 UUID Variant: Bits 64 and 65 5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0 6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value. 7. Set the remaining bits to the node as pseudo-random data 8. Format by concatenating the 128 bits together 9. Save the state (current timestamp and clock sequence) back to the stable store 5. Encoding and Storage The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability. For many applications such as databases this format is unnecessarily verbose totaling 288 bits. * 8 bits for each of the 32 hex characters = 256 bits * 8 bits for each of the 4 hyphens = 32 bits Where possible UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value. Peabody & Davis Expires 10 April 2022 [Page 24] Internet-Draft new-uuid-format October 2021 6. Global Uniqueness UUIDs created by this specification offer the same guarantees for global uniqueness as those found in [RFC4122]. Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use- cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain. 7. Distributed UUID Generation Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance. This machine ID MUST be placed in the UUID after the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification. 8. IANA Considerations This document has no IANA actions. 9. Security Considerations MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized. Peabody & Davis Expires 10 April 2022 [Page 25] Internet-Draft new-uuid-format October 2021 The machineID portion of node, described in Section 7, does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole. 10. Acknowledgements The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document. 11. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . 12. Informative References [LexicalUUID] Twitter, "A Scala client for Cassandra", commit f6da4e0, November 2012, . [Snowflake] Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, May 2014, . [Flake] Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, February 2017, . Peabody & Davis Expires 10 April 2022 [Page 26] Internet-Draft new-uuid-format October 2021 [ShardingID] Instagram Engineering, "Sharding & IDs at Instagram", December 2012, . [KSUID] Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, July 2020, . [Elasticflake] Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, January 2015, . [FlakeID] Pawlak, T., "Flake ID Generator", Commit fcd6a2f, April 2020, . [Sonyflake] Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, August 2020, . [orderedUuid] Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", January 2020, . [COMBGUID] Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, December 2020, . [ULID] Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, May 2019, . [SID] Chilton, A., "sid : generate sortable identifiers", Commit 660e947, June 2019, . [pushID] Google, "The 2^120 Ways to Ensure Unique Identifiers", February 2015, . [XID] Poitrey, O., "Globally Unique ID Generator", Commit efa678f, October 2020, . [ObjectID] MongoDB, "ObjectId - MongoDB Manual", . Peabody & Davis Expires 10 April 2022 [Page 27] Internet-Draft new-uuid-format October 2021 [CUID] Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, October 2020, . Authors' Addresses Brad G. Peabody Email: brad@peabody.io Kyzer R. Davis Email: kydavis@cisco.com Peabody & Davis Expires 10 April 2022 [Page 28] ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-02.xml ================================================ New UUID Formats
brad@peabody.io
kydavis@cisco.com
ART dispatch uuid This document presents new time-based UUID formats which are suited for use as a database key. A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update with three new UUID versions that address these concerns, each with different trade-offs.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .
A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys. The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments. However some properties of UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this. While it is true that UUIDv1 does contain an embedded timestamp and can be time-ordered; UUIDv1 has other issues. It is possible to sort Version 1 UUIDs by time but it is a laborious task. The process requires breaking the bytes of the UUID into various pieces, re-ordering the bits, and then determining the order from the reconstructed timestamp. This is not efficient in very large systems. Implementations would be simplified with a sort order where the UUID can simply be treated as an opaque sequence of bytes and ordered as such. After the embedded timestamp, the remaining 64 bits are in essence used to provide uniqueness both on a global scale and within a given timestamp tick. The clock sequence value ensures that when multiple UUIDs are generated for the same timestamp value are given a monotonic sequence value. This explicit sequencing helps further facilitate sorting. The remaining random bits ensure collisions are minimal. Furthermore, UUIDv1 utilizes a non-standard timestamp epoch derived from the Gregorian Calendar. More specifically, the Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582. Implementations and many languages may find it easier to implement the widely adopted and well known Unix Epoch, a custom epoch, or another timestamp source with various levels of timestamp precision required by the application. Lastly, privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Instead "cryptographically secure" pseudo-random number generators (CSPRNGs) or pseudo-random number generators (PRNG) SHOULD be used within an application context to provide uniqueness and unguessability. Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.
  1. by Twitter
  2. by Twitter
  3. by Boundary
  4. by Instagram
  5. by Segment
  6. by P. Pearcy
  7. by T. Pawlak
  8. by Sony
  9. by IT. Cabrera
  10. by R. Tallent
  11. by A. Feerasta
  12. by A. Chilton
  13. by Google
  14. by O. Poitrey
  15. by MongoDB
  16. by E. Elliott
An inspection of these implementations details the following trends that help define this standard:
  • - Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.
  • - Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.
  • - Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.
  • - The ID format SHOULD be Lexicographically sortable while in the textual representation.
  • - IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.
  • - IDs MUST NOT require unique network identifiers as part of achieving uniqueness.
  • - Distributed nodes MUST be able to create collision resistant Unique IDs without consulting a centralized resource.
In order to solve these challenges this specification introduces three new version identifiers assigned for time-based UUIDs. The first, UUIDv6, aims to be the easiest to implement for applications which already implement UUIDv1. The UUIDv6 specification keeps the original Gregorian timestamp source but does not reorder the timestamp bits as per the process utilized by UUIDv1. UUIDv6 also requires that pseudo-random data MUST be used in place of the MAC address. The rest of the UUIDv1 format remains unchanged in UUIDv6. See Next, UUIDv7 introduces an entirely new time-based UUID bit layout utilizing a variable length timestamp sourced from the widely implemented and well known Unix Epoch timestamp source. The timestamp is broken into a 36 bit integer sections part, and is followed by a field of variable length which represents the sub-second timestamp portion, encoded so that each bit from most to least significant adds more precision. See Finally, UUIDv8 introduces a relaxed time-based UUID format that caters to application implementations that cannot utilize UUIDv1, UUIDv6, or UUIDv7. UUIDv8 also future-proofs this specification by allowing time-based UUID formats from timestamp sources that are not yet be defined. The variable size timestamp offers lots of flexibility to create an implementation specific RFC compliant time-based UUID while retaining the properties that make UUID great. See
RFC EDITOR PLEASE DELETE THIS SECTION. draft-02
  • - Added Changelog
  • - Fixed misc. grammatical errors
  • - Fixed section numbering issue
  • - Fixed some UUIDvX reference issues
  • - Changed all instances of "motonic" to "monotonic"
  • - Changed all instances of "#-bit" to "# bit"
  • - Changed "proceeding" veriage to "after" in section 7
  • - Added details on how to pad 32 bit unix timestamp to 36 bits in UUIDv7
  • - Added details on how to truncate 64 bit unix timestamp to 36 bits in UUIDv7
  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.
  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4
draft-01
  • - Complete rewrite of entire document.
  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.
  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.
  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)
  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)
The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout.
Table 1 defines the 4 bit version found in Bits 48 through 51 within a given UUID. UUID versions defined by this specification
Msb0Msb1Msb2Msb3VersionDescription
01106Reordered Gregorian time-based UUID
01117Variable length Unix Epoch time-based UUID
10008Custom time-based UUID
The variant bits utilized by UUIDs in this specification remains the same as . The Table 2 lists the contents of the variant field, bits 64 and 65, where the letter "x" indicates a "don't-care" value. Common hex values of 8 (1000), 9 (1001), A (1010), and B (1011) frequent the text representation. UUID Variant defined by this specification
Msb0Msb1Msb2Description
10xThe variant specified in this document.
UUIDv6 aims to be the easiest to implement by reusing most of the layout of bits found in UUIDv1 but with changes to bit ordering for the timestamp. Where UUIDv1 splits the timestamp bits into three distinct parts and orders them as time_low, time_mid, time_high_and_version. UUIDv6 instead keeps the source bits from the timestamp intact and changes the order to time_high, time_mid, and time_low. Incidentally this will match the original 60 bit Gregorian timestamp source with 100-nanosecond precision defined in The clock sequence bits remain unchanged from their usage and position in . The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose retain the old MAC address behavior from and The format for the 16-octet, 128 bit UUIDv6 is shown in Figure 1
UUIDv6 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)
The following implementation algorithm is based on but with changes specific to UUIDv6:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time as a 60 bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
  3. Set the time_low field to the 12 least significant bits of the starting 60 bit timestamp.
  4. Truncate the timestamp to the 48 most significant bits in order to create time_high_and_time_mid.
  5. Set the time_high field to the 32 most significant bits of the truncated timestamp.
  6. Set the time_mid field to the 16 least significant bits of the truncated timestamp.
  7. Create the 16 bit time_low_and_version by concatenating the 4 bit UUIDv6 version with the 12 bit time_low.
  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp generate a random 14 bit clock sequence value.
  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.
  10. Complete the 16 bit clock sequence high, low and reserved creation by concatenating the clock sequence onto UUID variant bits which take the most significant position in the 16 bit value.
  11. Generate a 48 bit pseudo-random node.
  12. Format by concatenating the 128 bits from each parts: time_high|time_mid|time_low_and_version|variant_clk_seq|node
  13. Save the state (current timestamp and clock sequence) back to the stable store
The steps for splitting time_high_and_time_mid into time_high and time_mid are optional since the 48 bits of time_high and time_mid will remain in the same order as time_high_and_time_mid during the final concatenation. This extra step of splitting into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. In which the following logic can be applied to reshuffle the bits with minimal modifications. UUIDv1 to UUIDv6 Field Mappings
UUIDv1 FieldBitsUUIDv6 Field
time_low 32time_high
time_mid 16time_mid
time_high12time_low
The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision. The key property provided by UUIDv7 is that timestamp values generated by one system and parsed by another are guaranteed to have sub-second precision of either the generator or the parser, whichever is less. Additionally, the system parsing the UUIDv7 value does not need to know which precision was used during encoding in order to function correctly. The format for the 16-octet, 128 bit UUIDv7 is shown in Figure 2
UUIDv7 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | subsec_a | ver | subsec_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | subsec_seq_node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
unixts:
36 bit big-endian unsigned Unix Timestamp value
subsec_a:
12 bits allocated to sub-second precision values.
ver:
The 4 bit UUIDv7 version (0111)
subsec_b:
12 bits allocated to sub-second precision values.
var:
2 bit UUID variant (10)
subsec_seq_node:
The remaining 62 bits which MAY be allocated to any combination of additional sub-second precision, sequence counter, or pseudo-random data.
UUIDv7 utilizes a 36 bit big-endian unsigned Unix Timestamp value (number of seconds since the epoch of 1 Jan 1970, leap seconds excluded so each hour is exactly 3600 seconds long). The 36 bit value was selected in order to provide more available time to the unix timestamp and avoid the Year 2038 problem by extending the maximum timestamp to the year 4147. To achieve a 36 bit UUIDv7 timestamp, the lower 36 bits of a 64 bit unix time are extracted verbatim into UUIDv7 In the event that 32 bit Unix Timestamp are in use; four zeros MUST be appended at the start in the most significant (left-most) bits of the 32 bit Unix timestamp creating the 36 bit Unix timestamp. This ensures sorting compatibility with 64 bit unix timestamp which have been truncated to 36 bits. Additional sub-second precision (millisecond, nanosecond, microsecond, etc) MAY be provided for encoding and decoding in the remaining bits in the layout. UUIDv8 SHOULD be used in place of UUIDv7 if an application or implementation does not want to truncate a 64 bit Unix Epoch to the lower 36 bits.
UUIDv7 SHOULD utilize a monotonic sequence counter to provide additional sequencing guarantees when multiple UUIDv7 values are created in the same UNIXTS and SUBSEC timestamp. The amount of bits allocates to the sequence counter depend on the precision of the timestamp. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. For best sequencing results the sequence counter SHOULD be placed immediately after available sub-second bits. The clock sequence MUST start at zero and increment monotonically for each new UUIDv7 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv7 generation rates.
UUIDv7 implementations, even with very detailed sub-second precision and the optional sequence counter, MAY have leftover bits that will be identified as the Node for this section. The UUIDv7 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node SHOULD be filled with pseudo-random data.
The UUIDv7 bit layout for encoding and decoding are described separately in this document.
Since the UUIDv7 Unix timestamp is fixed at 36 bits in length the exact layout for encoding UUIDv7 depends on the precision (number of bits) used for the sub-second portion and the sizes of the optionally desired sequence counter and node bits. Three examples of UUIDv7 encoding are given below as a general guidelines but implementations are not limited to just these three examples. All of these fields are only used during encoding, and during decoding the system is unaware of the bit layout used for them and considers this information opaque. As such, implementations generating these values can assign whatever lengths to each field it deems applicable, as long as it does not break decoding compatibility (i.e. Unix timestamp (unixts), version (ver) and variant (var) have to stay where they are, and clock sequence counter (seq), random (random) or other implementation specific values must follow the sub-second encoding). In Figure 3 the UUIDv7 has been created with millisecond precision with the available sub-second precision bits. Examining Figure 3 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to millisecond information (msec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to a monotonic clock sequence counter (seq).
  • The 2 Variant bits remain unchanged (var).
  • Finally the remaining 62 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Millisecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | msec | ver | seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In Figure 4 the UUIDv7 has been created with Microsecond precision with the available sub-second precision bits. Examining Figure 4 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Microsecond precision (usec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Microsecond precision (usec).
  • The 2 Variant bits remain unchanged (var).
  • A 14 bit monotonic clock sequence counter (seq) has been embedded in the most significant position of subsec_seq_node
  • Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Microsecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | usec | ver | usec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In Figure 5 the UUIDv7 has been created with Nanosecond precision with the available sub-second precision bits. Examining Figure 5 one can observe:
  • The first 36 bits have been dedicated to the Unix Timestamp (unixts)
  • All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The 4 Version bits remain unchanged (ver).
  • All 12 bits of subsec_b have been dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The 2 Variant bits remain unchanged (var).
  • The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
  • The next 8 bits of subsec_seq_node dedicated a monotonic clock sequence counter (seq).
  • Finally the remaining 40 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
UUIDv7 Field and Bit Layout - Encoding Example (Nanosecond Precision) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unixts | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |unixts | nsec | ver | nsec | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| nsec | seq | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When decoding or parsing a UUIDv7 value there are only two values to be considered:
  1. The unix timestamp defined as unixts
  2. The sub-second precision values defined as subsec_a, subsec_b, and subsec_seq_node
As detailed in Figure 2 the unix timestamp (unixts) is always the first 36 bits of the UUIDv7 layout. Similarly as per Figure 2, the sub-second precision values lie within subsec_a, subsec_b, and subsec_seq_node which are all interpreted as sub-second information after skipping over the version (ver) and (var) bits. These concatenated sub-second information bits are interpreted in a way where most to least significant bits represent a further division by two. This is the same normal place notation used to express fractional numbers, except in binary. For example, in decimal ".1" means one tenth, and ".01" means one hundredth. In this subsec field, a 1 means one half, 01 means one quarter, 001 is one eighth, etc. This scheme can work for any number of bits up to the maximum available, and keeps the most significant data leftmost in the bit sequence. To perform the sub-second math, simply take the first (most significant/leftmost) N bits of subsec and divide it by 2^N. Take for example:
  1. To parse the first 16 bits, extract that value as an integer and divide it by 65536 (2 to the 16th).
  2. If these 16 bits are 0101 0101 0101 0101, then treating that as an integer gives 0x5555 or 21845 in decimal, and dividing by 65536 gives 0.3333282
This sub-second encoding scheme provides maximum interoperability across systems where different levels of time precision are required/feasible/available. The timestamp value derived from a UUIDv7 value SHOULD be "as close to the correct value as possible" when parsed, even across disparate systems. Take for example the starting point for our next two UUIDv7 parsing scenarios:
  1. System A produces a UUIDv7 with a microsecond-precise timestamp value.
  2. System B is unaware of the precision encoded in the UUIDv7 timestamp by System A.
Scenario 1:
  1. System B parses the embedded timestamp with millisecond precision. (Less precision than the encoder)
  2. System B SHOULD return the correct millisecond value encoded by system A (truncated to milliseconds).
Scenario 2:
  1. System B parses the timestamp with nanosecond precision. (More precision than the encoder)
  2. System B's value returned SHOULD have the same microsecond level of precision provided by the encoder with the additional precision down to nanosecond level being essentially random as per the encoded random value at the end of the UUIDv7.
UUIDv8 offers variable-size timestamp, clock sequence, and node values which allow for a highly customizable UUID that fits a given application needs. UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv7. Some situations in which UUIDv8 usage could occur:
  • An implementation would like to utilize a timestamp source not defined by the current time-based UUIDs.
  • An implementation would like to utilize a timestamp bit layout not defined by the current time-based UUIDs.
  • An implementation would like to avoid truncating a 64 bit Unix to 36 bits as defined by UUIDv7.
  • An implementation would like a specific level of precision within the timestamp not offered by current time-based UUIDs.
  • An implementation would like to embed extra information within the UUID node other than what is defined in this document.
  • An implementation has other application/language restrictions which inhibit the usage of one of the current time-based UUIDs.
Roughly speaking a properly formatted UUIDv8 SHOULD contain the following sections adding up to a total of 128 bits.
  • - Timestamp Bits (Variable Length)
  • - Clock Sequence Bits (Variable Length)
  • - Node Bits (Variable Length)
  • - UUIDv8 Version Bits (4 bits)
  • - UUID Variant Bits (2 Bits)
The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. UUIDv8's 128 bits (including the version and variant) SHOULD contain at the minimum a timestamp of some format in the most significant bit position followed directly by a clock sequence counter and finally a node containing either random data or implementation specific data. A sample format in Figure 6 is used to further illustrate the point for the 16-octet, 128 bit UUIDv8.
UUIDv8 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp_48 | ver | time_or_seq | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| seq_or_node | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
timestamp_32:
The most significant 32 bits of the desired timestamp source. Occupies bits 0 through 31 (octets 0-3).
timestamp_48:
The next 16 bits of the timestamp source when a timestamp source with at least 48 bits is used. When a 32 bit timestamp source is utilized, these bits are set to 0. Occupies bits 32 through 47
ver:
The 4 bit UUIDv8 version (1000). Occupies bits 48 through 51.
time_or_seq:
If a 60 bit, or larger, timestamp is used these 12 bits are used to fill out the remaining timestamp. If a 32 or 48 bit timestamp is leveraged a 12 bit clock sequence MAY be used. Together ver and time_or_seq occupy bits 48 through 63 (octets 6-7)
var:
2 bit UUID variant (10)
seq_or_node:
If a 60 bit, or larger, timestamp source is leverages these 8 bits SHOULD be allocated for an 8 bit clock sequence counter. If a 32 or 48 bit timestamp source is used these 8 bits SHOULD be set to random.
node:
In most implementations these bits will likely be set to pseudo-random data. However, implementations utilize the node as they see fit. Together var, seq_or_node, and node occupy Bits 64 through 127 (octets 8-15)
UUIDv8's usage of timestamp relaxes both the timestamp source and timestamp length. Implementations are free to utilize any monotonically stable timestamp source for UUIDv8. Some examples include:
  • - Custom Epoch
  • - NTP timestamp
  • - ISO 8601 timestamp
  • - Full, Non-truncated 64 bit Unix Epoch timestamp
The relaxed nature UUIDv8 timestamps also works to future proof this specification and allow implementations a method to create compliant time-based UUIDs using timestamp source that might not yet be defined. Timestamps come in many sizes and UUIDv8 defines three fields that can easily used for the majority of timestamp lengths:
  • 32 bit timestamp: using timestamp_32 and setting timestamp_48 to 0s
  • 48 bit timestamp: using timestamp_32 and timestamp_48 entirely
  • 60 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq
  • 64 bit timestamp: using timestamp_32, timestamp_48, and time_or_seq and truncating the timestamp the 60 most significant bits.
Although it is possible to create a timestamp larger than 64 bits in size The usage and bit layout of that timestamp format is up to the implementation. When a timestamp exceeds the 64th bit (octet 7), extra care must be taken to ensure the Variant bits are properly inserted at their respective location in the UUID. Likewise, the Version MUST always be implemented at the appropriate location. Any timestamps that does not entirely fill the timestamp_32, timestamp_48 or time_or_seq MUST set all leftover bits in the least significant position of the respective field to 0. For example a 36 bit timestamp source would fully utilize timestamp_32 and 4 bits of timestamp_48. The remaining 12 bits in timestamp_48 MUST be set to 0. By using implementation-specific timestamp sources it is not guaranteed that devices outside of the application context are able to extract and parse the timestamp from UUIDv8 without some pre-existing knowledge of the source timestamp used by the UUIDv8 implementation.
A clock sequence MUST be used with UUIDv8 as added sequencing guarantees when multiple UUIDv8 will be created on the same clock tick. The amount of bits allocated to the clock sequence depends on the precision of the timestamp source. For example, a more accurate timestamp source using nanosecond precision will require less clock sequence bits than a timestamp source utilizing seconds for precision. The UUIDv8 layout in Figure 6 generically defines two possible clock sequence values that can leveraged:
  • 12 bit clock sequence using time_or_seq for use when the timestamp is less than 48 bits which allows for 4095 UUIDs per clock tick.
  • 8 bit clock sequence using seq_or_node when the timestamp uses more than 48 bits which allows for 255 UUIDs per clock tick.
An implementation MAY use both time_or_seq and seq_or_node for clock sequencing however it is highly unlikely that 20 bits of clock sequence are needed for a given clock tick. Furthermore, more bits from the node MAY be used for clock sequencing in the event that 8 bits is not sufficient. The clock sequence MUST start at zero and increment monotonically for each new UUIDv8 created on by the application on the same timestamp. When the timestamp increments the clock sequence MUST be reset to zero. The clock sequence MUST NOT rollover or reset to zero unless the timestamp has incremented. Care MUST be given to ensure that an adequate sized clock sequence is selected for a given application based on expected timestamp precision and expected UUIDv8 generation rates.
The UUIDv8 Node MAY contain any set of data an implementation desires however the node MUST NOT be set to all 0s which does not ensure global uniqueness. In most scenarios the node will be filled with pseudo-random data. The UUIDv8 layout in Figure 6 defines 2 sizes of Node depending on the timestamp size:
  • 62 bit node encompassing seq_or_node and node Used when a timestamp of 48 bits or less is leveraged.
  • 54 bit node when all 60 bits of the timestamp are in use and the seq_or_node is used as clock sequencing.
An implementation MAY choose to allocate bits from the node to the timestamp, clock sequence or application-specific embedded field. It is recommended that implementation utilize a node of at least 48 bits to ensure global uniqueness can be guaranteed.
The entire usage of UUIDv8 is meant to be variable and allow as much customization as possible to meet specific application/language requirements. As such any UUIDv8 implementations will likely vary among applications. The following algorithm is a generic implementation using Figure 6 and the recommendations outlined in this specification. 32 bit timestamp, 12 bit sequence counter, 62 bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp
  4. Set 16 bit timestamp_48 to all 0s
  5. Set the version to 8 (1000)
  6. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (time_or_seq) to 0
  7. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (time_or_seq).
  8. Set the variant to binary 10
  9. Generate 62 random bits and fill in 8 bits for seq_or_node and 54 bits for the node.
  10. Format by concatenating the 128 bits as: timestamp_32|timestamp_48|version|time_or_seq|variant|seq_or_node|node
  11. Save the state (current timestamp and clock sequence) back to the stable store
48 bit timestamp, 12 bit sequence counter, 62 bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32 bit field timestamp_32 to the 32 most significant bits from the timestamp
  4. Set 16 bit timestamp_48 to the 16 least significant bits from the timestamp
  5. The rest of the steps are the same as the previous example.
60 bit timestamp, 8 bit sequence counter, 54 bit node:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as 32 bits.
  3. Set the 32 bit field timestamp_32 to the 32 bits from the timestamp
  4. Set 16 bit timestamp_48 to the 16 middle bits from the timestamp
  5. Set the version to 8 (1000)
  6. Set 12 bit time_or_seq to the 12 least significant bits from the timestamp
  7. Set the variant to 10
  8. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12 bit clock sequence value (seq_or_node) to 0
  9. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value (seq_or_node).
  10. Generate 54 random bits and fill in the node
  11. Format by concatenating the 128 bits as: timestamp_32|timestamp_48|version|time_or_seq|variant|seq_or_node|node
  12. Save the state (current timestamp and clock sequence) back to the stable store
64 bit timestamp, 8 bit sequence counter, 54 bit node:
  1. The same steps as the 60 bit timestamp can be utilized if the 64 bit timestamp is truncated to 60 bits.
  2. Implementations MAY chose to truncate the most or least significant bits but it is recommended to utilize the most significant 60 bits and lose 4 bits of precision in the nanoseconds or microseconds position.
General algorithm for generation of UUIDv8 not defined here:
  1. From a system-wide shared stable store (e.g., a file) or global variable, read the UUID generator state: the values of the timestamp and clock sequence used to generate the last UUID.
  2. Obtain the current time from the selected clock source as desired bit total
  3. Set total amount of bits for timestamp as required in the most significant positions of the 128 bit UUID
  4. Care MUST be taken to ensure that the UUID Version and UUID Variant are in the correct bit positions. UUID Version: Bits 48 through 51 UUID Variant: Bits 64 and 65
  5. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the desired clock sequence value to 0
  6. If the state was available, but the saved timestamp is less than or equal to the current timestamp, increment the clock sequence value.
  7. Set the remaining bits to the node as pseudo-random data
  8. Format by concatenating the 128 bits together
  9. Save the state (current timestamp and clock sequence) back to the stable store
The existing UUID hex and dash format of 8-4-4-4-12 is retained for both backwards compatibility and human readability. For many applications such as databases this format is unnecessarily verbose totaling 288 bits.
  • 8 bits for each of the 32 hex characters = 256 bits
  • 8 bits for each of the 4 hyphens = 32 bits
Where possible UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.
UUIDs created by this specification offer the same guarantees for global uniqueness as those found in . Furthermore, the time-based UUIDs defined in this specification are geared towards database applications but MAY be used for a wide variety of use-cases. Just as global uniqueness is guaranteed, UUIDs are guaranteed to be unique within an application context within the enterprise domain.
Some implementations might desire to utilize multi-node, clustered, applications which involve 2 or more applications independently generating UUIDs that will be stored in a common location. UUIDs already feature sufficient entropy to ensure that the chances of collision are low. However, implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance. This machine ID MUST be placed in the UUID after the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible. The machineID MUST NOT be an IEEE 802 MAC address. The creation and negotiation of the machineID among distributed nodes is out of scope for this specification.
This document has no IANA actions.
MAC addresses pose inherent security risks and MUST not be used for node generation. As such they have been strictly forbidden from time-based UUIDs within this specification. Instead pseudo-random bits SHOULD selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with the clock sequence does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then UUIDv4 SHOULD be utilized. The machineID portion of node, described in , does provide small unique identifier which could be used to determine which application is generating data but this machineID alone is not enough to identify a node on the network without other corresponding data points. Furthermore the machineID, like the timestamp+sequence, does not provide any context about the data the corresponds to the UUID or the current state of the application as a whole.
The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o. As well as all of those in and outside the IETF community to who contributed to the discussions which resulted in this document.
Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. A Universally Unique IDentifier (UUID) URN Namespace This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation\'s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms. This specification is derived from the DCE specification with the kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been incorporated into this document. [STANDARDS-TRACK] A Scala client for Cassandra Twitter Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees. Twitter Flake: A decentralized, k-ordered id generation service in Erlang Boundary Sharding & IDs at Instagram Instagram Engineering K-Sortable Globally Unique IDs Segment Sequential UUID / Flake ID generator pulled out of elasticsearch common Flake ID Generator A distributed unique ID generator inspired by Twitter's Snowflake Sony Laravel: The mysterious "Ordered UUID" Creating sequential GUIDs in C# for MSSQL or PostgreSql Universally Unique Lexicographically Sortable Identifier sid : generate sortable identifiers The 2^120 Ways to Ensure Unique Identifiers Google Globally Unique ID Generator ObjectId - MongoDB Manual MongoDB Collision-resistant ids optimized for horizontal scaling and performance.
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-03.html ================================================ New UUID Formats
Internet-Draft new-uuid-format March 2022
Peabody & Davis Expires 2 October 2022 [Page]
Workgroup:
dispatch
Internet-Draft:
draft-peabody-dispatch-new-uuid-format-03
Updates:
4122 (if approved)
Published:
Intended Status:
Standards Track
Expires:
Authors:
BGP. Peabody
K. Davis

New UUID Formats

Abstract

This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 2 October 2022.

1. Introduction

Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.

One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:

  1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.

  2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754].

  3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.

  4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.

  5. Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.

  6. [RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.

Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.

While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.

  1. [ULID] by A. Feerasta

  2. [LexicalUUID] by Twitter

  3. [Snowflake] by Twitter

  4. [Flake] by Boundary

  5. [ShardingID] by Instagram

  6. [KSUID] by Segment

  7. [Elasticflake] by P. Pearcy

  8. [FlakeID] by T. Pawlak

  9. [Sonyflake] by Sony

  10. [orderedUuid] by IT. Cabrera

  11. [COMBGUID] by R. Tallent

  12. [SID] by A. Chilton

  13. [pushID] by Google

  14. [XID] by O. Poitrey

  15. [ObjectID] by MongoDB

  16. [CUID] by E. Elliott

An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.

2. Terminology

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2. Abbreviations

The following abbreviations are used in this document:

UUID
Universally Unique Identifier [RFC4122]
CSPRNG
Cryptographically Secure Pseudo-Random Number Generator
MAC
Media Access Control
MSB
Most Significant Bit
DBMS
Database Management System

3. Summary of Changes

The following UUIDs are hereby introduced:

UUID version 6 (UUIDv6)
A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See Section 5.1
UUID version 7 (UUIDv7)
An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See Section 5.2
UUID version 8 (UUIDv8)
A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See Section 5.3
Max UUID
A specialized UUID which is the inverse of [RFC4122], Section 4.1.7 See Section 5.4

3.1. changelog

RFC EDITOR PLEASE DELETE THIS SECTION.

draft-03

  • - Reworked the draft body to make the content more concise

  • - UUIDv6 section reworked to just the reorder of the timestamp

  • - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp

  • - UUIDv8 relaxed to be custom in all elements except version and variant

  • - Introduced Max UUID.

  • - Added C code samples in Appendix.

  • - Added test vectors in Appendix.

  • - Version and Variant section combined into one section.

  • - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).

  • - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.

  • - Split Encoding and Storage into Opacity and DBMS and Database Considerations

  • - Reworked Global Uniqueness under new section Global and Local Uniqueness

  • - Node verbiage only used in UUIDv6 all others reference random/rand instead

  • - Clock sequence verbiage changed simply to counter in any section other than UUIDv6

  • - Added Abbreviations section

  • - Updated IETF Draft XML Layout

  • - Added information about little-endian UUIDs

draft-02

  • - Added Changelog

  • - Fixed misc. grammatical errors

  • - Fixed section numbering issue

  • - Fixed some UUIDvX reference issues

  • - Changed all instances of "motonic" to "monotonic"

  • - Changed all instances of "#-bit" to "# bit"

  • - Changed "proceeding" verbiage to "after" in section 7

  • - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7

  • - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7

  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.

  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4

draft-01

  • - Complete rewrite of entire document.

  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.

  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.

  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)

  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)

4. Variant and Version Fields

The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1.

The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1.

Table 1: New UUID variant 10xx (8/9/A/B) versions defined by this specification
Msb0 Msb1 Msb2 Msb3 Version Description
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document

For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.

00000000-0000-6000-8000-000000000000
00000000-0000-6000-9000-000000000000
00000000-0000-6000-A000-000000000000
00000000-0000-6000-B000-000000000000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Figure 1: UUIDv6 Variant Examples

5. New Formats

The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.

5.1. UUID Version 6

UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead.

Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp.

The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5.

The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8

The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           time_high                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           time_mid            |      time_low_and_version     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         node (2-5)                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: UUIDv6 Field and Bit Layout
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)

With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.

5.2. UUID Version 7

UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6.

Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: UUIDv7 Field and Bit Layout
unix_ts_ms:
48 bit big-endian unsigned number of Unix epoch timestamp as per Section 6.1.
ver:
4 bit UUIDv7 version set as per Section 4
rand_a:
12 bits pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.
var:
The 2 bit variant defined by Section 4.
rand_b:
The final 62 bits of pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6.

5.3. UUID Version 8

UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed.

The only explicitly defined bits are the Version and Variant leaving 120 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data.

Some example situations in which UUIDv8 usage could occur:

  • An implementation would like to embed extra information within the UUID other than what is defined in this document.

  • An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_a                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          custom_a             |  ver  |       custom_b        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                       custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           custom_c                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: UUIDv8 Field and Bit Layout
custom_a:
The first 48 bits of the layout that can be filled as an implementation sees fit.
ver:
The 4 bit version field as defined by Section 4
custom_b:
12 more bits of the layout that can be filled as an implementation sees fit.
var:
The 2 bit variant field as defined by Section 4.
custom_c:
The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit.

5.4. Max UUID

The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7

FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF
Figure 5: Max UUID Format

6. UUID Best Practices

The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.

6.1. Timestamp Granularity

UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.

Reliability:
Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)
Source:
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged.
Sub-second Precision and Accuracy:
Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized.
Length:
The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp.
Altering, Fuzzing, or Smearing:
Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time.
Padding:
When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7.
Truncating:
Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7.

6.2. Monotonicity and Counters

Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section.

Additionally, care MUST be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp.

For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3

Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.

Fixed-Length Dedicated Counter Bits (Method 1):
This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 MAY be utilized as fixed-length dedicated counter bits. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits.
Monotonic Random (Method 2):
With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick.

The following sub-topics cover methods behind incrementing either type of counter method:

Plus One Increment (Type A):
With this increment logic the counter method is incremented by one for every UUID generation. When this increment method is utilized with Fixed-Length Dedicated Counter the trailing random generated for each new UUID can help produce unguessable UUIDs. When this increment method is utilized with Monotonic Random Counters the resulting values are easily guessable. Implementations that favor unguessiblity SHOULD NOT utilize this method with the monotonic random method.
Random Increment (Type B):
With this increment the actual increment of the counter MAY be a random integer of any desired length larger than zero. When this increment method is utilized with Fixed-Length Dedicated Counters the random increments MAY deplete the counter bit space (including any rollover guards) faster than the desired if a counter of adequate length is not selected. When this increment method is utilized with Monotonic Random Counters the counter ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy.

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

Fixed-Length Dedicated Counter Seeding:
Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers.
Fixed-Length Dedicated Counter Length:
Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.

The following sub-topics cover rollover handling with either type of counter method:

Counter Rollover Guards:
The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also recommended and SHOULD be employed to help mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding.
Counter Rollover Handling:
Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.

Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:

  1. Compare the current timestamp against the previously stored timestamp.

  2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type.

  3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).

Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous.

6.3. Distributed UUID Generation

Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments.

Centralized Registry:
With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification.
Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per Section 8. The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification.

Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement.

6.4. Collision Resistance

Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3.

There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.

Low Impact
A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format.
High Impact:
A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context.

6.5. Global and Local Uniqueness

UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required.

Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122].

6.6. Unguessability

Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs.

Advice on generating cryptographic-quality random numbers can be found in [RFC4086]

6.7. Sorting

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection.

Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large.

UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation.

UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.

6.8. Opacity

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

6.9. DBMS and Database Considerations

For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.

For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:

  • Storing as binary requires less space and may result in faster data access.

  • Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.

DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.

7. IANA Considerations

This document has no IANA actions.

8. Security Considerations

MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information.

Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.

9. Acknowledgements

The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.

10. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.
[RFC4086]
Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", RFC 4086, DOI 10.17487/RFC4086, , <https://www.rfc-editor.org/info/rfc4122>.

11. Informative References

[LexicalUUID]
Twitter, "A Scala client for Cassandra", commit f6da4e0, , <https://github.com/twitter-archive/cassie>.
[Snowflake]
Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, , <https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010>.
[Flake]
Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, , <https://github.com/boundary/flake>.
[ShardingID]
Instagram Engineering, "Sharding & IDs at Instagram", , <https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c>.
[KSUID]
Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, , <https://github.com/segmentio/ksuid>.
[Elasticflake]
Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, , <https://github.com/ppearcy/elasticflake>.
[FlakeID]
Pawlak, T., "Flake ID Generator", Commit fcd6a2f, , <https://github.com/T-PWK/flake-idgen>.
[Sonyflake]
Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, , <https://github.com/sony/sonyflake>.
[orderedUuid]
Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", , <https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8>.
[COMBGUID]
Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, , <https://github.com/richardtallent/RT.Comb>.
[ULID]
Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, , <https://github.com/ulid/spec>.
[SID]
Chilton, A., "sid : generate sortable identifiers", Commit 660e947, , <https://github.com/chilts/sid>.
[pushID]
Google, "The 2^120 Ways to Ensure Unique Identifiers", , <https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html>.
[XID]
Poitrey, O., "Globally Unique ID Generator", Commit efa678f, , <https://github.com/rs/xid>.
[ObjectID]
MongoDB, "ObjectId - MongoDB Manual", <https://docs.mongodb.com/manual/reference/method/ObjectId/>.
[CUID]
Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, , <https://github.com/ericelliott/cuid>.
[IEEE754]
IEEE, "Collision-resistant ids optimized for horizontal scaling and performance.", Series 754-2019, , <https://standards.ieee.org/ieee/754/6210/>.

Appendix A. Example Code

A.1. Creating a UUIDv6 Value

This section details a function in C which converts from a UUID version 1 to version 6:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>
#include <uuid/uuid.h>

/* Converts UUID version 1 to version 6 in place. */
void uuidv1tov6(uuid_t u) {

  uint64_t ut;
  unsigned char *up = (unsigned char *)u;

  // load ut with the first 64 bits of the UUID
  ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32;
  ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4])));

  // dance the bit-shift...
  ut =
    ((ut >> 32) & 0x0FFF) | // 12 least significant bits
    (0x6000) | // version number
    ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits
    ((ut << 20) & 0x000FFFF000000000) | // next 16 bits
    (ut << 52); // 12 most significant bits

  // store back in UUID
  *((uint32_t*)up) = htonl((uint32_t)(ut >> 32));
  *((uint32_t*)&up[4]) = htonl((uint32_t)(ut));

}
Figure 6: UUIDv6 Function in C

A.2. Creating a UUIDv7 Value

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

// ...

// csprng data source
FILE *rndf;
rndf = fopen("/dev/urandom", "r");
if (rndf == 0) {
    printf("fopen /dev/urandom error\n");
    return 1;
}

// ...

// generate one UUIDv7E
uint8_t u[16];
struct timespec ts;
int ret;

ret = clock_gettime(CLOCK_REALTIME, &ts);
if (ret != 0) {
    printf("clock_gettime error: %d\n", ret);
    return 1;
}

uint64_t tms;

tms = ((uint64_t)ts.tv_sec) * 1000;
tms += ((uint64_t)ts.tv_nsec) / 1000000;

memset(u, 0, 16);

fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes

*((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place

u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0
u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1
Figure 7: UUIDv7 Function in C

A.3. Creating a UUIDv8 Value

UUIDv8 will vary greatly from implementation to implementation. A good candidate use case for UUIDv8 is to embed exotic timestamps like the one found in this example which employs approximately 0.25 milliseconds and approximately 5 microseconds per timestamp tick as a 48 bit value.

#include <stdint.h>
#include <stdio.h>
#include <time.h>

int main() {
  struct timespec tp;
  clock_gettime(CLOCK_REALTIME, &tp);
  uint64_t timestamp = (uint64_t)tp.tv_sec << 12;

  // compute 12 bit (~0.25 msec precision) fraction from nsecs
  timestamp |= ((uint64_t)tp.tv_nsec << 12) / 1000000000;

  printf("%08llx-%04llx\n", timestamp >> 16, timestamp & 0xFFFF);
  return 0;
}
Figure 8: UUIDv8 Function in C

Appendix B. Test Vectors

Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00

Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.

# Unix Nanosecond precision to Gregorian 100-nanosecond intervals
gregorian_100_ns = (Unix_64_bit_nanoseconds / 100) + gregorian_Unix_offset

# Gregorian to Unix Offset:
# The number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
# gregorian_Unix_offset = 0x01b21dd213814000 or 122192928000000000

# Unix 64 bit Nanosecond Timestamp:
# Unix NS: Tuesday, February 22, 2022 2:22:22 PM GMT-05:00
# Unix_64_bit_nanoseconds = 0x16D6320C3D4DCC00 or 1645557742000000000

# Work:
# gregorian_100_ns = (1645557742000000000 / 100) + 122192928000000000
# (138648505420000000 - 122192928000000000) * 100 = Unix_64_bit_nanoseconds

# Final:
# gregorian_100_ns = 0x1EC9414C232AB00 or 138648505420000000

# Original: 000111101100100101000001010011000010001100101010101100000000
# UUIDv1:   11000010001100101010101100000000|1001010000010100|0001|000111101100
# UUIDv6:   00011110110010010100000101001100|0010001100101010|0110|101100000000
Figure 9: Test Vector Timestamp Pseudo-code

B.1. Example of a UUIDv6 Value

----------------------------------------------
field                 bits    value_hex
----------------------------------------------
time_low              32      0xC232AB00
time_mid              16      0x9414
time_hi_and_version   16      0x11EC
clk_seq_hi_res         8      0xB3
clock_seq_low          8      0xC8
node                  48      0x9E6BDECED846
----------------------------------------------
total                128
----------------------------------------------
final_hex: C232AB00-9414-11EC-B3C8-9E6BDECED846
Figure 10: UUIDv1 Example Test Vector
-----------------------------------------------
field                 bits    value_hex
-----------------------------------------------
time_high              32      0x1EC9414C
time_mid               16      0x232A
time_low_and_version   16      0x6B00
clk_seq_hi_res          8      0xB3
clock_seq_low           8      0xC8
node                   48      0x9E6BDECED846
-----------------------------------------------
total                 128
-----------------------------------------------
final_hex: 1EC9414C-232A-6B00-B3C8-9E6BDECED846
Figure 11: UUIDv6 Example Test Vector

B.2. Example of a UUIDv7 Value

This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits

rand_a and rand_b are filled with random data.

The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F21CFD130 or 1645539742000

-------------------------------
field      bits    value
-------------------------------
unix_ts_ms   48    0x017F21CFD130
var           4    0x7
rand_a       12    0xCC3
var           2    b10
rand_b       62    0x18C4DC0C0C07398F
-------------------------------
total       128
-------------------------------
final: 017F21CF-D130-7CC3-98C4-DC0C0C07398F
Figure 12: UUIDv7 Example Test Vector

B.3. Example of a UUIDv8 Value

This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version.

The next two segments of custom_b and custom_c are are filled with random data.

Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000

It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.

-------------------------------
field      bits    value
-------------------------------
custom_a     48    0x320C3D4DCC00
ver           4    0x8
custom_b     12    0x75B
var           2    b10
custom_c     62    0xEC932D5F69181C0
-------------------------------
total       128
-------------------------------
final: 320C3D4D-CC00-875B-8EC9-32D5F69181C0
Figure 13: UUIDv8 Example Test Vector

Appendix C. Version and Variant Tables

C.1. Variant 10xx Versions

Table 2: All UUID variant 10xx (8/9/A/B) version definitions.
Msb0 Msb1 Msb2 Msb3 Version Description
0 0 0 0 0 Unused
0 0 0 1 1 The Gregorian time-based UUID from in [RFC4122], Section 4.1.3
0 0 1 0 2 DCE Security version, with embedded POSIX UIDs from [RFC4122], Section 4.1.3
0 0 1 1 3 The name-based version specified in [RFC4122], Section 4.1.3 that uses MD5 hashing.
0 1 0 0 4 The randomly or pseudo-randomly generated version specified in [RFC4122], Section 4.1.3.
0 1 0 1 5 The name-based version specified in [RFC4122], Section 4.1.3 that uses SHA-1 hashing.
0 1 1 0 6 Reordered Gregorian time-based UUID specified in this document.
0 1 1 1 7 Unix Epoch time-based UUID specified in this document.
1 0 0 0 8 Reserved for custom UUID formats specified in this document.
1 0 0 1 9 Reserved for future definition.
1 0 1 0 10 Reserved for future definition.
1 0 1 1 11 Reserved for future definition.
1 1 0 0 12 Reserved for future definition.
1 1 0 1 13 Reserved for future definition.
1 1 1 0 14 Reserved for future definition.
1 1 1 1 15 Reserved for future definition.

Authors' Addresses

Brad G. Peabody
Kyzer R. Davis
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-03.txt ================================================ dispatch BGP. Peabody Internet-Draft Updates: 4122 (if approved) K. Davis Intended status: Standards Track 31 March 2022 Expires: 2 October 2022 New UUID Formats draft-peabody-dispatch-new-uuid-format-03 Abstract This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 2 October 2022. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Peabody & Davis Expires 2 October 2022 [Page 1] Internet-Draft new-uuid-format March 2022 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 5 3. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 5 3.1. changelog . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Variant and Version Fields . . . . . . . . . . . . . . . . . 7 5. New Formats . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1. UUID Version 6 . . . . . . . . . . . . . . . . . . . . . 8 5.2. UUID Version 7 . . . . . . . . . . . . . . . . . . . . . 9 5.3. UUID Version 8 . . . . . . . . . . . . . . . . . . . . . 10 5.4. Max UUID . . . . . . . . . . . . . . . . . . . . . . . . 11 6. UUID Best Practices . . . . . . . . . . . . . . . . . . . . . 12 6.1. Timestamp Granularity . . . . . . . . . . . . . . . . . . 12 6.2. Monotonicity and Counters . . . . . . . . . . . . . . . . 13 6.3. Distributed UUID Generation . . . . . . . . . . . . . . . 16 6.4. Collision Resistance . . . . . . . . . . . . . . . . . . 17 6.5. Global and Local Uniqueness . . . . . . . . . . . . . . . 18 6.6. Unguessability . . . . . . . . . . . . . . . . . . . . . 18 6.7. Sorting . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.8. Opacity . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.9. DBMS and Database Considerations . . . . . . . . . . . . 19 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 10. Normative References . . . . . . . . . . . . . . . . . . . . 20 11. Informative References . . . . . . . . . . . . . . . . . . . 20 Appendix A. Example Code . . . . . . . . . . . . . . . . . . . . 22 A.1. Creating a UUIDv6 Value . . . . . . . . . . . . . . . . . 22 A.2. Creating a UUIDv7 Value . . . . . . . . . . . . . . . . . 23 A.3. Creating a UUIDv8 Value . . . . . . . . . . . . . . . . . 24 Appendix B. Test Vectors . . . . . . . . . . . . . . . . . . . . 24 B.1. Example of a UUIDv6 Value . . . . . . . . . . . . . . . . 25 B.2. Example of a UUIDv7 Value . . . . . . . . . . . . . . . . 26 B.3. Example of a UUIDv8 Value . . . . . . . . . . . . . . . . 26 Appendix C. Version and Variant Tables . . . . . . . . . . . . . 27 C.1. Variant 10xx Versions . . . . . . . . . . . . . . . . . . 27 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 Peabody & Davis Expires 2 October 2022 [Page 2] Internet-Draft new-uuid-format March 2022 1. Introduction Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions. One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics: 1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. 2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754]. 3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison. 4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed. 5. Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations. Peabody & Davis Expires 2 October 2022 [Page 3] Internet-Draft new-uuid-format March 2022 6. [RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different. Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing. 1. [ULID] by A. Feerasta 2. [LexicalUUID] by Twitter 3. [Snowflake] by Twitter 4. [Flake] by Boundary 5. [ShardingID] by Instagram 6. [KSUID] by Segment 7. [Elasticflake] by P. Pearcy 8. [FlakeID] by T. Pawlak 9. [Sonyflake] by Sony 10. [orderedUuid] by IT. Cabrera 11. [COMBGUID] by R. Tallent 12. [SID] by A. Chilton 13. [pushID] by Google 14. [XID] by O. Poitrey 15. [ObjectID] by MongoDB 16. [CUID] by E. Elliott An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues. 2. Terminology 2.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Peabody & Davis Expires 2 October 2022 [Page 4] Internet-Draft new-uuid-format March 2022 2.2. Abbreviations The following abbreviations are used in this document: UUID Universally Unique Identifier [RFC4122] CSPRNG Cryptographically Secure Pseudo-Random Number Generator MAC Media Access Control MSB Most Significant Bit DBMS Database Management System 3. Summary of Changes The following UUIDs are hereby introduced: UUID version 6 (UUIDv6) A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See Section 5.1 UUID version 7 (UUIDv7) An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See Section 5.2 UUID version 8 (UUIDv8) A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See Section 5.3 Max UUID A specialized UUID which is the inverse of [RFC4122], Section 4.1.7 See Section 5.4 3.1. changelog RFC EDITOR PLEASE DELETE THIS SECTION. draft-03 - Reworked the draft body to make the content more concise - UUIDv6 section reworked to just the reorder of the timestamp - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp - UUIDv8 relaxed to be custom in all elements except version and variant Peabody & Davis Expires 2 October 2022 [Page 5] Internet-Draft new-uuid-format March 2022 - Introduced Max UUID. - Added C code samples in Appendix. - Added test vectors in Appendix. - Version and Variant section combined into one section. - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG). - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc. - Split Encoding and Storage into Opacity and DBMS and Database Considerations - Reworked Global Uniqueness under new section Global and Local Uniqueness - Node verbiage only used in UUIDv6 all others reference random/ rand instead - Clock sequence verbiage changed simply to counter in any section other than UUIDv6 - Added Abbreviations section - Updated IETF Draft XML Layout - Added information about little-endian UUIDs draft-02 - Added Changelog - Fixed misc. grammatical errors - Fixed section numbering issue - Fixed some UUIDvX reference issues - Changed all instances of "motonic" to "monotonic" - Changed all instances of "#-bit" to "# bit" - Changed "proceeding" verbiage to "after" in section 7 - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7 - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7 - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option. - Fixed bad reference to non-existent "time_or_node" in section 4.5.4 draft-01 - Complete rewrite of entire document. - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards. - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques. Peabody & Davis Expires 2 October 2022 [Page 6] Internet-Draft new-uuid-format March 2022 - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research) - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes) 4. Variant and Version Fields The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1. The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1. +------+------+------+------+---------+---------------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+---------------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian time- | | | | | | | based UUID specified in | | | | | | | this document. | +------+------+------+------+---------+---------------------------+ | 0 | 1 | 1 | 1 | 7 | Unix Epoch time-based | | | | | | | UUID specified in this | | | | | | | document. | +------+------+------+------+---------+---------------------------+ | 1 | 0 | 0 | 0 | 8 | Reserved for custom UUID | | | | | | | formats specified in this | | | | | | | document | +------+------+------+------+---------+---------------------------+ Table 1: New UUID variant 10xx (8/9/A/B) versions defined by this specification For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant. 00000000-0000-6000-8000-000000000000 00000000-0000-6000-9000-000000000000 00000000-0000-6000-A000-000000000000 00000000-0000-6000-B000-000000000000 xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx Figure 1: UUIDv6 Variant Examples Peabody & Davis Expires 2 October 2022 [Page 7] Internet-Draft new-uuid-format March 2022 5. New Formats The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure. 5.1. UUID Version 6 UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead. Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp. The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5. The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8 The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: UUIDv6 Field and Bit Layout time_high: The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3) Peabody & Davis Expires 2 October 2022 [Page 8] Internet-Draft new-uuid-format March 2022 time_mid: The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5) time_low_and_version: The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7) clk_seq_hi_res: The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8) clock_seq_low: The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9) node: 48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15) With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation. 5.2. UUID Version 7 UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6. Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible. Peabody & Davis Expires 2 October 2022 [Page 9] Internet-Draft new-uuid-format March 2022 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_ts_ms | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unix_ts_ms | ver | rand_a | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| rand_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rand_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: UUIDv7 Field and Bit Layout unix_ts_ms: 48 bit big-endian unsigned number of Unix epoch timestamp as per Section 6.1. ver: 4 bit UUIDv7 version set as per Section 4 rand_a: 12 bits pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6. var: The 2 bit variant defined by Section 4. rand_b: The final 62 bits of pseudo-random data to provide uniqueness as per Section 6.2 and Section 6.6. 5.3. UUID Version 8 UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed. The only explicitly defined bits are the Version and Variant leaving 120 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. Some example situations in which UUIDv8 usage could occur: * An implementation would like to embed extra information within the UUID other than what is defined in this document. Peabody & Davis Expires 2 October 2022 [Page 10] Internet-Draft new-uuid-format March 2022 * An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_a | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_a | ver | custom_b | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |var| custom_c | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | custom_c | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: UUIDv8 Field and Bit Layout custom_a: The first 48 bits of the layout that can be filled as an implementation sees fit. ver: The 4 bit version field as defined by Section 4 custom_b: 12 more bits of the layout that can be filled as an implementation sees fit. var: The 2 bit variant field as defined by Section 4. custom_c: The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit. 5.4. Max UUID The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7 FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF Figure 5: Max UUID Format Peabody & Davis Expires 2 October 2022 [Page 11] Internet-Draft new-uuid-format March 2022 6. UUID Best Practices The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations. 6.1. Timestamp Granularity UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic. Reliability: Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.) Source: UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged. Sub-second Precision and Accuracy: Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized. Peabody & Davis Expires 2 October 2022 [Page 12] Internet-Draft new-uuid-format March 2022 Length: The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp. Altering, Fuzzing, or Smearing: Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time. Padding: When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7. Truncating: Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7. 6.2. Monotonicity and Counters Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section. Additionally, care MUST be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp. For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3 Peabody & Davis Expires 2 October 2022 [Page 13] Internet-Draft new-uuid-format March 2022 Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation. Fixed-Length Dedicated Counter Bits (Method 1): This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 MAY be utilized as fixed- length dedicated counter bits. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits. Monotonic Random (Method 2): With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick. The following sub-topics cover methods behind incrementing either type of counter method: Plus One Increment (Type A): With this increment logic the counter method is incremented by one for every UUID generation. When this increment method is utilized with Fixed-Length Dedicated Counter the trailing random generated for each new UUID can help produce unguessable UUIDs. When this increment method is utilized with Monotonic Random Counters the resulting values are easily guessable. Implementations that favor unguessiblity SHOULD NOT utilize this method with the monotonic random method. Random Increment (Type B): With this increment the actual increment of the counter MAY be a random integer of any desired length larger than zero. When this increment method is utilized with Fixed-Length Dedicated Counters the random increments MAY deplete the counter bit space (including any rollover guards) faster than the desired if a counter of adequate length is not selected. When this increment method is utilized with Monotonic Random Counters the counter ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy. Peabody & Davis Expires 2 October 2022 [Page 14] Internet-Draft new-uuid-format March 2022 The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters: Fixed-Length Dedicated Counter Seeding: Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers. Fixed-Length Dedicated Counter Length: Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch. The following sub-topics cover rollover handling with either type of counter method: Counter Rollover Guards: The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also recommended and SHOULD be employed to help mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding. Peabody & Davis Expires 2 October 2022 [Page 15] Internet-Draft new-uuid-format March 2022 Counter Rollover Handling: Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature: 1. Compare the current timestamp against the previously stored timestamp. 2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type. 3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter). Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous. 6.3. Distributed UUID Generation Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments. Centralized Registry: With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification. Peabody & Davis Expires 2 October 2022 [Page 16] Internet-Draft new-uuid-format March 2022 Node IDs: With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per Section 8. The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification. Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement. 6.4. Collision Resistance Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3. There are two example scenarios below which help illustrate the varying seriousness of a collision within an application. Low Impact A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format. High Impact: A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context. Peabody & Davis Expires 2 October 2022 [Page 17] Internet-Draft new-uuid-format March 2022 6.5. Global and Local Uniqueness UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required. Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122]. 6.6. Unguessability Implementations SHOULD utilize a cryptographically secure pseudo- random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs. Advice on generating cryptographic-quality random numbers can be found in [RFC4086] 6.7. Sorting UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection. Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large. UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation. UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8. Peabody & Davis Expires 2 October 2022 [Page 18] Internet-Draft new-uuid-format March 2022 6.8. Opacity UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant. 6.9. DBMS and Database Considerations For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value. For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such: * Storing as binary requires less space and may result in faster data access. * Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement. DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback. 7. IANA Considerations This document has no IANA actions. 8. Security Considerations MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but Peabody & Davis Expires 2 October 2022 [Page 19] Internet-Draft new-uuid-format March 2022 does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized. 9. Acknowledgements The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document. 10. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", RFC 4086, DOI 10.17487/RFC4086, June 2005, . 11. Informative References [LexicalUUID] Twitter, "A Scala client for Cassandra", commit f6da4e0, November 2012, . [Snowflake] Twitter, "Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.", Commit b3f6a3c, May 2014, . Peabody & Davis Expires 2 October 2022 [Page 20] Internet-Draft new-uuid-format March 2022 [Flake] Boundary, "Flake: A decentralized, k-ordered id generation service in Erlang", Commit 15c933a, February 2017, . [ShardingID] Instagram Engineering, "Sharding & IDs at Instagram", December 2012, . [KSUID] Segment, "K-Sortable Globally Unique IDs", Commit bf376a7, July 2020, . [Elasticflake] Pearcy, P., "Sequential UUID / Flake ID generator pulled out of elasticsearch common", Commit dd71c21, January 2015, . [FlakeID] Pawlak, T., "Flake ID Generator", Commit fcd6a2f, April 2020, . [Sonyflake] Sony, "A distributed unique ID generator inspired by Twitter's Snowflake", Commit 848d664, August 2020, . [orderedUuid] Cabrera, IT., "Laravel: The mysterious "Ordered UUID"", January 2020, . [COMBGUID] Tallent, R., "Creating sequential GUIDs in C# for MSSQL or PostgreSql", Commit 2759820, December 2020, . [ULID] Feerasta, A., "Universally Unique Lexicographically Sortable Identifier", Commit d0c7170, May 2019, . [SID] Chilton, A., "sid : generate sortable identifiers", Commit 660e947, June 2019, . [pushID] Google, "The 2^120 Ways to Ensure Unique Identifiers", February 2015, . [XID] Poitrey, O., "Globally Unique ID Generator", Commit efa678f, October 2020, . Peabody & Davis Expires 2 October 2022 [Page 21] Internet-Draft new-uuid-format March 2022 [ObjectID] MongoDB, "ObjectId - MongoDB Manual", . [CUID] Elliott, E., "Collision-resistant ids optimized for horizontal scaling and performance.", Commit 215b27b, October 2020, . [IEEE754] IEEE, "Collision-resistant ids optimized for horizontal scaling and performance.", Series 754-2019, July 2019, . Appendix A. Example Code A.1. Creating a UUIDv6 Value This section details a function in C which converts from a UUID version 1 to version 6: #include #include #include #include #include /* Converts UUID version 1 to version 6 in place. */ void uuidv1tov6(uuid_t u) { uint64_t ut; unsigned char *up = (unsigned char *)u; // load ut with the first 64 bits of the UUID ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32; ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4]))); // dance the bit-shift... ut = ((ut >> 32) & 0x0FFF) | // 12 least significant bits (0x6000) | // version number ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits ((ut << 20) & 0x000FFFF000000000) | // next 16 bits (ut << 52); // 12 most significant bits // store back in UUID *((uint32_t*)up) = htonl((uint32_t)(ut >> 32)); *((uint32_t*)&up[4]) = htonl((uint32_t)(ut)); } Peabody & Davis Expires 2 October 2022 [Page 22] Internet-Draft new-uuid-format March 2022 Figure 6: UUIDv6 Function in C A.2. Creating a UUIDv7 Value #include #include #include #include #include // ... // csprng data source FILE *rndf; rndf = fopen("/dev/urandom", "r"); if (rndf == 0) { printf("fopen /dev/urandom error\n"); return 1; } // ... // generate one UUIDv7E uint8_t u[16]; struct timespec ts; int ret; ret = clock_gettime(CLOCK_REALTIME, &ts); if (ret != 0) { printf("clock_gettime error: %d\n", ret); return 1; } uint64_t tms; tms = ((uint64_t)ts.tv_sec) * 1000; tms += ((uint64_t)ts.tv_nsec) / 1000000; memset(u, 0, 16); fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes *((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0 u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1 Figure 7: UUIDv7 Function in C Peabody & Davis Expires 2 October 2022 [Page 23] Internet-Draft new-uuid-format March 2022 A.3. Creating a UUIDv8 Value UUIDv8 will vary greatly from implementation to implementation. A good candidate use case for UUIDv8 is to embed exotic timestamps like the one found in this example which employs approximately 0.25 milliseconds and approximately 5 microseconds per timestamp tick as a 48 bit value. #include #include #include int main() { struct timespec tp; clock_gettime(CLOCK_REALTIME, &tp); uint64_t timestamp = (uint64_t)tp.tv_sec << 12; // compute 12 bit (~0.25 msec precision) fraction from nsecs timestamp |= ((uint64_t)tp.tv_nsec << 12) / 1000000000; printf("%08llx-%04llx\n", timestamp >> 16, timestamp & 0xFFFF); return 0; } Figure 8: UUIDv8 Function in C Appendix B. Test Vectors Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data. Peabody & Davis Expires 2 October 2022 [Page 24] Internet-Draft new-uuid-format March 2022 # Unix Nanosecond precision to Gregorian 100-nanosecond intervals gregorian_100_ns = (Unix_64_bit_nanoseconds / 100) + gregorian_Unix_offset # Gregorian to Unix Offset: # The number of 100-ns intervals between the # UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00. # gregorian_Unix_offset = 0x01b21dd213814000 or 122192928000000000 # Unix 64 bit Nanosecond Timestamp: # Unix NS: Tuesday, February 22, 2022 2:22:22 PM GMT-05:00 # Unix_64_bit_nanoseconds = 0x16D6320C3D4DCC00 or 1645557742000000000 # Work: # gregorian_100_ns = (1645557742000000000 / 100) + 122192928000000000 # (138648505420000000 - 122192928000000000) * 100 = Unix_64_bit_nanoseconds # Final: # gregorian_100_ns = 0x1EC9414C232AB00 or 138648505420000000 # Original: 000111101100100101000001010011000010001100101010101100000000 # UUIDv1: 11000010001100101010101100000000|1001010000010100|0001|000111101100 # UUIDv6: 00011110110010010100000101001100|0010001100101010|0110|101100000000 Figure 9: Test Vector Timestamp Pseudo-code B.1. Example of a UUIDv6 Value ---------------------------------------------- field bits value_hex ---------------------------------------------- time_low 32 0xC232AB00 time_mid 16 0x9414 time_hi_and_version 16 0x11EC clk_seq_hi_res 8 0xB3 clock_seq_low 8 0xC8 node 48 0x9E6BDECED846 ---------------------------------------------- total 128 ---------------------------------------------- final_hex: C232AB00-9414-11EC-B3C8-9E6BDECED846 Figure 10: UUIDv1 Example Test Vector Peabody & Davis Expires 2 October 2022 [Page 25] Internet-Draft new-uuid-format March 2022 ----------------------------------------------- field bits value_hex ----------------------------------------------- time_high 32 0x1EC9414C time_mid 16 0x232A time_low_and_version 16 0x6B00 clk_seq_hi_res 8 0xB3 clock_seq_low 8 0xC8 node 48 0x9E6BDECED846 ----------------------------------------------- total 128 ----------------------------------------------- final_hex: 1EC9414C-232A-6B00-B3C8-9E6BDECED846 Figure 11: UUIDv6 Example Test Vector B.2. Example of a UUIDv7 Value This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits rand_a and rand_b are filled with random data. The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F21CFD130 or 1645539742000 ------------------------------- field bits value ------------------------------- unix_ts_ms 48 0x017F21CFD130 var 4 0x7 rand_a 12 0xCC3 var 2 b10 rand_b 62 0x18C4DC0C0C07398F ------------------------------- total 128 ------------------------------- final: 017F21CF-D130-7CC3-98C4-DC0C0C07398F Figure 12: UUIDv7 Example Test Vector B.3. Example of a UUIDv8 Value This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version. Peabody & Davis Expires 2 October 2022 [Page 26] Internet-Draft new-uuid-format March 2022 The next two segments of custom_b and custom_c are are filled with random data. Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000 It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example. ------------------------------- field bits value ------------------------------- custom_a 48 0x320C3D4DCC00 ver 4 0x8 custom_b 12 0x75B var 2 b10 custom_c 62 0xEC932D5F69181C0 ------------------------------- total 128 ------------------------------- final: 320C3D4D-CC00-875B-8EC9-32D5F69181C0 Figure 13: UUIDv8 Example Test Vector Appendix C. Version and Variant Tables C.1. Variant 10xx Versions +------+------+------+------+---------+----------------------------+ | Msb0 | Msb1 | Msb2 | Msb3 | Version | Description | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 0 | 0 | 0 | Unused | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 0 | 1 | 1 | The Gregorian time-based | | | | | | | UUID from in [RFC4122], | | | | | | | Section 4.1.3 | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 1 | 0 | 2 | DCE Security version, with | | | | | | | embedded POSIX UIDs from | | | | | | | [RFC4122], Section 4.1.3 | +------+------+------+------+---------+----------------------------+ | 0 | 0 | 1 | 1 | 3 | The name-based version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3 that uses | | | | | | | MD5 hashing. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 0 | 0 | 4 | The randomly or pseudo- | Peabody & Davis Expires 2 October 2022 [Page 27] Internet-Draft new-uuid-format March 2022 | | | | | | randomly generated version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 0 | 1 | 5 | The name-based version | | | | | | | specified in [RFC4122], | | | | | | | Section 4.1.3 that uses | | | | | | | SHA-1 hashing. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 1 | 0 | 6 | Reordered Gregorian time- | | | | | | | based UUID specified in | | | | | | | this document. | +------+------+------+------+---------+----------------------------+ | 0 | 1 | 1 | 1 | 7 | Unix Epoch time-based UUID | | | | | | | specified in this | | | | | | | document. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 0 | 0 | 8 | Reserved for custom UUID | | | | | | | formats specified in this | | | | | | | document. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 0 | 1 | 9 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 1 | 0 | 10 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 0 | 1 | 1 | 11 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 0 | 0 | 12 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 0 | 1 | 13 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 1 | 0 | 14 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ | 1 | 1 | 1 | 1 | 15 | Reserved for future | | | | | | | definition. | +------+------+------+------+---------+----------------------------+ Table 2: All UUID variant 10xx (8/9/A/B) version definitions. Authors' Addresses Brad G. Peabody Peabody & Davis Expires 2 October 2022 [Page 28] Internet-Draft new-uuid-format March 2022 Email: brad@peabody.io Kyzer R. Davis Email: kydavis@cisco.com Peabody & Davis Expires 2 October 2022 [Page 29] ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format-03.xml ================================================ New UUID Formats
brad@peabody.io
kydavis@cisco.com
ART dispatch uuid This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.
Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions. One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:
  1. Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.
  2. The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as .
  3. Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.
  4. Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.
  5. Many of the implementation details specified in involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.
  6. does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.
Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways. While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.
  1. by A. Feerasta
  2. by Twitter
  3. by Twitter
  4. by Boundary
  5. by Instagram
  6. by Segment
  7. by P. Pearcy
  8. by T. Pawlak
  9. by Sony
  10. by IT. Cabrera
  11. by R. Tallent
  12. by A. Chilton
  13. by Google
  14. by O. Poitrey
  15. by MongoDB
  16. by E. Elliott
An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.
The following abbreviations are used in this document:
UUID
Universally Unique Identifier
CSPRNG
Cryptographically Secure Pseudo-Random Number Generator
MAC
Media Access Control
MSB
Most Significant Bit
DBMS
Database Management System
The following UUIDs are hereby introduced:
UUID version 6 (UUIDv6)
A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation. See
UUID version 7 (UUIDv7)
An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source. See
UUID version 8 (UUIDv8)
A free-form UUID format which has no explicit requirements except maintaining backward compatibility. See
Max UUID
A specialized UUID which is the inverse of See
RFC EDITOR PLEASE DELETE THIS SECTION. draft-03
  • - Reworked the draft body to make the content more concise
  • - UUIDv6 section reworked to just the reorder of the timestamp
  • - UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp
  • - UUIDv8 relaxed to be custom in all elements except version and variant
  • - Introduced Max UUID.
  • - Added C code samples in Appendix.
  • - Added test vectors in Appendix.
  • - Version and Variant section combined into one section.
  • - Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).
  • - Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.
  • - Split Encoding and Storage into Opacity and DBMS and Database Considerations
  • - Reworked Global Uniqueness under new section Global and Local Uniqueness
  • - Node verbiage only used in UUIDv6 all others reference random/rand instead
  • - Clock sequence verbiage changed simply to counter in any section other than UUIDv6
  • - Added Abbreviations section
  • - Updated IETF Draft XML Layout
  • - Added information about little-endian UUIDs
draft-02
  • - Added Changelog
  • - Fixed misc. grammatical errors
  • - Fixed section numbering issue
  • - Fixed some UUIDvX reference issues
  • - Changed all instances of "motonic" to "monotonic"
  • - Changed all instances of "#-bit" to "# bit"
  • - Changed "proceeding" verbiage to "after" in section 7
  • - Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7
  • - Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7
  • - Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.
  • - Fixed bad reference to non-existent "time_or_node" in section 4.5.4
draft-01
  • - Complete rewrite of entire document.
  • - The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.
  • - Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.
  • - Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)
  • - Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)
The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by . The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in . New UUID variant 10xx (8/9/A/B) versions defined by this specification
Msb0Msb1Msb2Msb3VersionDescription
01106Reordered Gregorian time-based UUID specified in this document.
01117Unix Epoch time-based UUID specified in this document.
10008Reserved for custom UUID formats specified in this document
For UUID version 6, 7 and 8 the variant field placement from are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.
UUIDv6 Variant Examples
The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.
UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead. Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in , for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp. The clock sequence bits remain unchanged from their usage and position in . The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from and . For more information on MAC address usage within UUIDs see the The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1
UUIDv6 Field and Bit Layout 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_high | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_low_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
time_high:
The most significant 32 bits of the 60 bit starting timestamp. Occupies bits 0 through 31 (octets 0-3)
time_mid:
The middle 16 bits of the 60 bit starting timestamp. Occupies bits 32 through 47 (octets 4-5)
time_low_and_version:
The first four most significant bits MUST contain the UUIDv6 version (0110) while the remaining 12 bits will contain the least significant 12 bits from the 60 bit starting timestamp. Occupies bits 48 through 63 (octets 6-7)
clk_seq_hi_res:
The first two bits MUST be set to the UUID variant (10) The remaining 6 bits contain the high portion of the clock sequence. Occupies bits 64 through 71 (octet 8)
clock_seq_low:
The 8 bit low portion of the clock sequence. Occupies bits 72 through 79 (octet 9)
node:
48 bit spatially unique identifier Occupies bits 80 through 127 (octets 10-15)
With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.
UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6. Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.
UUIDv7 Field and Bit Layout
unix_ts_ms:
48 bit big-endian unsigned number of Unix epoch timestamp as per .
ver:
4 bit UUIDv7 version set as per
rand_a:
12 bits pseudo-random data to provide uniqueness as per and .
var:
The 2 bit variant defined by .
rand_b:
The final 62 bits of pseudo-random data to provide uniqueness as per and .
UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in . UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed. The only explicitly defined bits are the Version and Variant leaving 120 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data. Some example situations in which UUIDv8 usage could occur:
  • An implementation would like to embed extra information within the UUID other than what is defined in this document.
  • An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.
UUIDv8 Field and Bit Layout
custom_a:
The first 48 bits of the layout that can be filled as an implementation sees fit.
ver:
The 4 bit version field as defined by
custom_b:
12 more bits of the layout that can be filled as an implementation sees fit.
var:
The 2 bit variant field as defined by .
custom_c:
The final 62 bits of the layout immediatly following the var field to be filled as an implementation sees fit.
The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in
Max UUID Format
The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.
UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.
Reliability:
Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)
Source:
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or a custom timestamp epoch are required UUIDv8 SHOULD be leveraged.
Sub-second Precision and Accuracy:
Many levels of precision exist for timestamps: milliseconds, microseconds, nanoseconds, and beyond. Additionally fractional representations of sub-second precision may be desired to mix various levels of precision in a time-ordered manner. Furthermore, system clocks themselves have an underlying granularity and it is frequently less than the precision offered by the operating system. With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features fixed millisecond level of precision within the Unix epoch that does not exceed the granularity capable in most modern systems. For other levels of precision UUIDv8 SHOULD be utilized.
Length:
The length of a given timestamp directly impacts how long a given UUID will be valid. That is, how many timestamp ticks can be contained in a UUID before the maximum value for the timestamp field is reached. Care should be given to ensure that the proper length is selected for a given timestamp. UUID version 1 and 6 utilize a 60 bit timestamp and UUIDv7 features a 48 bit timestamp.
Altering, Fuzzing, or Smearing:
Implementations MAY alter the actual timestamp. Some examples included security considerations around providing a real clock value within a UUID, to correct inaccurate clocks or to handle leap seconds. This specification makes no requirement or guarantee about how close the clock value needs to be to actual time.
Padding:
When timestamp padding is required, implementations MUST pad the most significant bits (left-most) bits with zeros. An example is padding the most significant, left-most bits of a 32 bit Unix timestamp with zero's to fill out the 48 bit timestamp in UUIDv7.
Truncating:
Similarly, when timestamps need to be truncated: the lower, least significant bits MUST be used. An example would be truncating a 64 bit Unix timestamp to the least significant, right-most 48 bits for UUIDv7.
Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section. Additionally, care MUST be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp. For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.
Fixed-Length Dedicated Counter Bits (Method 1):
This references the practice of allocating a specific number of bits in the UUID layout to the sole purpose of tallying the total number of UUIDs created during a given UUID timestamp tick. Positioning of a fixed bit-length counter SHOULD be immediatly after the embedded timestamp. This promotes sortability and allows random data generation for each counter increment. With this method rand_a section of UUIDv7 MAY be utilized as fixed-length dedicated counter bits. In the event more counter bits are required the most significant, left-most, bits of rand_b MAY be leveraged as additional counter bits.
Monotonic Random (Method 2):
With this method the random data is extended to also double as a counter. This monotonic random can be thought of as a "randomly seeded counter" which MUST be incremented in the least significant position for each UUID created on a given timestamp tick. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick.
The following sub-topics cover methods behind incrementing either type of counter method:
Plus One Increment (Type A):
With this increment logic the counter method is incremented by one for every UUID generation. When this increment method is utilized with Fixed-Length Dedicated Counter the trailing random generated for each new UUID can help produce unguessable UUIDs. When this increment method is utilized with Monotonic Random Counters the resulting values are easily guessable. Implementations that favor unguessiblity SHOULD NOT utilize this method with the monotonic random method.
Random Increment (Type B):
With this increment the actual increment of the counter MAY be a random integer of any desired length larger than zero. When this increment method is utilized with Fixed-Length Dedicated Counters the random increments MAY deplete the counter bit space (including any rollover guards) faster than the desired if a counter of adequate length is not selected. When this increment method is utilized with Monotonic Random Counters the counter ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy.
The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:
Fixed-Length Dedicated Counter Seeding:
Implementations utilizing fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented via the desired increment logic. When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bits are initialized as zero for the sole purpose of guarding against counter rollovers.
Fixed-Length Dedicated Counter Length:
Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.
The following sub-topics cover rollover handling with either type of counter method:
Counter Rollover Guards:
The technique from Fixed-Length Dedicated Counter Seeding which describes allocating a segment of the fixed-length counter as a rollover guard is also recommended and SHOULD be employed to help mitigate counter rollover issues. This same technique can be leveraged with Monotonic random counter methods by ensuring the total length of a possible increment in the least significant, right most position is less than the total length of the random being incremented. As such the most significant, left-most, bits can be incremented as rollover guarding.
Counter Rollover Handling:
Counter rollovers SHOULD be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.
Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:
  1. Compare the current timestamp against the previously stored timestamp.
  2. If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type.
  3. If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).
Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous.
Some implementations MAY desire to utilize multi-node, clustered, applications which involve two or more nodes independently generating UUIDs that will be stored in a common location. While UUIDs already feature sufficient entropy to ensure that the chances of collision are low as the total number of nodes increase; so does the likelihood of a collision. This section will detail the approaches that MAY be utilized by multi-node UUID implementations in distributed environments.
Centralized Registry:
With this method all nodes tasked with creating UUIDs consult a central registry and confirm the generated value is unique. As applications scale the communication with the central registry could become a bottleneck and impact UUID generation in a negative way. Utilization of shared knowledge schemes with central/global registries is outside the scope of this specification.
Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id. Implementations that choose to leverage an embedded node id SHOULD utilize UUIDv8. The node id SHOULD NOT be an IEEE 802 MAC address as per . The location and bit length are left to implementations and are outside the scope of this specification. Furthermore, the creation and negotiation of unique node ids among nodes is also out of scope for this specification.
Utilization of either a Centralized Registry or Node ID are not required for implementing UUIDs in this specification. However implementations SHOULD utilize one of the two aforementioned methods if distributed UUID generation is a requirement.
Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as and . This is especially true for distributed node collision resistance as defined by . There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.
Low Impact
A UUID collision generated a duplicate log entry which results in incorrect statistics derived from the data. Implementations that are not negatively affected by collisions may continue with the entropy and uniqueness provided by the traditional UUID format.
High Impact:
A duplicate key causes an airplane to receive the wrong course which puts people's lives at risk. In this scenario there is no margin for error. Collisions MUST be avoided and failure is unacceptable. Applications dealing with this type of scenario MUST employ as much collision resistance as possible within the given application context.
UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required. Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in as they see fit to extend the uniqueness guaranteed this specification and .
Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). CSPRNG ensures the best of and are present in modern UUIDs. Advice on generating cryptographic-quality random numbers can be found in
UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection. Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large. UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation. UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.
UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to for more information on determining UUID version and variant.
For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value. For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:
  • Storing as binary requires less space and may result in faster data access.
  • Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.
DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.
This document has no IANA actions.
MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See for more information. Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then UUIDv4 SHOULD be utilized.
The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.
Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings. A Universally Unique IDentifier (UUID) URN Namespace This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation\'s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms. This specification is derived from the DCE specification with the kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been incorporated into this document. [STANDARDS-TRACK] Randomness Requirements for Security Security systems are built on strong cryptographic algorithms that foil pattern analysis attempts. However, the security of these systems is dependent on generating secret quantities for passwords, cryptographic keys, and similar quantities. The use of pseudo-random processes to generate secret quantities can result in pseudo-security. A sophisticated attacker may find it easier to reproduce the environment that produced the secret quantities and to search the resulting small set of possibilities than to locate the quantities in the whole of the potential number space. Choosing random quantities to foil a resourceful and motivated adversary is surprisingly difficult. This document points out many pitfalls in using poor entropy sources or traditional pseudo-random number generation techniques for generating such quantities. It recommends the use of truly random hardware techniques and shows that the existing hardware on many systems can be used for this purpose. It provides suggestions to ameliorate the problem when a hardware solution is not available, and it gives examples of how large such quantities need to be for some applications. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. A Scala client for Cassandra Twitter Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees. Twitter Flake: A decentralized, k-ordered id generation service in Erlang Boundary Sharding & IDs at Instagram Instagram Engineering K-Sortable Globally Unique IDs Segment Sequential UUID / Flake ID generator pulled out of elasticsearch common Flake ID Generator A distributed unique ID generator inspired by Twitter's Snowflake Sony Laravel: The mysterious "Ordered UUID" Creating sequential GUIDs in C# for MSSQL or PostgreSql Universally Unique Lexicographically Sortable Identifier sid : generate sortable identifiers The 2^120 Ways to Ensure Unique Identifiers Google Globally Unique ID Generator ObjectId - MongoDB Manual MongoDB Collision-resistant ids optimized for horizontal scaling and performance. Collision-resistant ids optimized for horizontal scaling and performance. IEEE
This section details a function in C which converts from a UUID version 1 to version 6:
UUIDv6 Function in C #include #include #include #include /* Converts UUID version 1 to version 6 in place. */ void uuidv1tov6(uuid_t u) { uint64_t ut; unsigned char *up = (unsigned char *)u; // load ut with the first 64 bits of the UUID ut = ((uint64_t)ntohl(*((uint32_t*)up))) << 32; ut |= ((uint64_t)ntohl(*((uint32_t*)&up[4]))); // dance the bit-shift... ut = ((ut >> 32) & 0x0FFF) | // 12 least significant bits (0x6000) | // version number ((ut >> 28) & 0x0000000FFFFF0000) | // next 20 bits ((ut << 20) & 0x000FFFF000000000) | // next 16 bits (ut << 52); // 12 most significant bits // store back in UUID *((uint32_t*)up) = htonl((uint32_t)(ut >> 32)); *((uint32_t*)&up[4]) = htonl((uint32_t)(ut)); } ]]>
UUIDv7 Function in C #include #include #include #include // ... // csprng data source FILE *rndf; rndf = fopen("/dev/urandom", "r"); if (rndf == 0) { printf("fopen /dev/urandom error\n"); return 1; } // ... // generate one UUIDv7E uint8_t u[16]; struct timespec ts; int ret; ret = clock_gettime(CLOCK_REALTIME, &ts); if (ret != 0) { printf("clock_gettime error: %d\n", ret); return 1; } uint64_t tms; tms = ((uint64_t)ts.tv_sec) * 1000; tms += ((uint64_t)ts.tv_nsec) / 1000000; memset(u, 0, 16); fread(&u[6], 10, 1, rndf); // fill everything after the timestamp with random bytes *((uint64_t*)(u)) |= htonll(tms << 16); // shift time into first 48 bits and OR into place u[8] = 0x80 | (u[8] & 0x3F); // set variant field, top two bits are 1, 0 u[6] = 0x70 | (u[6] & 0x0F); // set version field, top four bits are 0, 1, 1, 1 ]]>
UUIDv8 will vary greatly from implementation to implementation. A good candidate use case for UUIDv8 is to embed exotic timestamps like the one found in this example which employs approximately 0.25 milliseconds and approximately 5 microseconds per timestamp tick as a 48 bit value.
UUIDv8 Function in C #include #include int main() { struct timespec tp; clock_gettime(CLOCK_REALTIME, &tp); uint64_t timestamp = (uint64_t)tp.tv_sec << 12; // compute 12 bit (~0.25 msec precision) fraction from nsecs timestamp |= ((uint64_t)tp.tv_nsec << 12) / 1000000000; printf("%08llx-%04llx\n", timestamp >> 16, timestamp & 0xFFFF); return 0; } ]]>
Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.
Test Vector Timestamp Pseudo-code
UUIDv1 Example Test Vector
UUIDv6 Example Test Vector
This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits rand_a and rand_b are filled with random data. The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F21CFD130 or 1645539742000
UUIDv7 Example Test Vector
This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version. The next two segments of custom_b and custom_c are are filled with random data. Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000 It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.
UUIDv8 Example Test Vector
All UUID variant 10xx (8/9/A/B) version definitions.
Msb0Msb1Msb2Msb3VersionDescription
00000Unused
00011The Gregorian time-based UUID from in
00102DCE Security version, with embedded POSIX UIDs from
00113The name-based version specified in that uses MD5 hashing.
01004The randomly or pseudo-randomly generated version specified in .
01015The name-based version specified in that uses SHA-1 hashing.
01106Reordered Gregorian time-based UUID specified in this document.
01117Unix Epoch time-based UUID specified in this document.
10008Reserved for custom UUID formats specified in this document.
10019Reserved for future definition.
101010Reserved for future definition.
101111Reserved for future definition.
110012Reserved for future definition.
110113Reserved for future definition.
111014Reserved for future definition.
111115Reserved for future definition.
================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format.txt ================================================ dispatch BGP. Peabody Internet-Draft February 10, 2020 Updates: 4122 (if approved) Intended status: Standards Track Expires: August 13, 2020 UUID Format Update draft-peabody-dispatch-new-uuid-format-00 Abstract This document presents a new UUID format (version 6) which is suited for use as a database key. This document is a proposal to update [RFC4122]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 13, 2020. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Peabody Expires August 13, 2020 [Page 1] Internet-Draft new-uuid-format February 2020 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Summary of Changes . . . . . . . . . . . . . . . . . . . . . 2 2.1. Version 6 . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2. Timestamp . . . . . . . . . . . . . . . . . . . . . . . . 2 2.3. Clock Sequence and Node Parts . . . . . . . . . . . . . . 3 2.4. Alternate Text Formats . . . . . . . . . . . . . . . . . 3 2.4.1. Base64 Text (Variant A) . . . . . . . . . . . . . . . 4 2.4.2. Base32 Text . . . . . . . . . . . . . . . . . . . . . 4 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 5. Security Considerations . . . . . . . . . . . . . . . . . . . 5 6. Normative References . . . . . . . . . . . . . . . . . . . . 5 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 5 1. Introduction The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Summary of Changes The following is a summary of proposed changes to the UUID specification in [RFC4122]. Each is given as a statement of a problem or limitation to which it is addressed, along with a description of the proposed change. 2.1. Version 6 A common case for modern applications is to need to create a unique identifier (to be used as a primary key in a database table) that is ordered by creation time, difficult to guess and has a compact text format. None of the existing UUID versions address each of these requirements. Thus a new UUID version number 6 is proposed. 2.2. Timestamp The timestamp value from [RFC4122] (60-bit number of 100- nanosecond intervals since 00:00:00.00, 15 October 1582) is workable but the sequence in which the bytes are encoded (the lowest bytes first) results in unnecessary additional logic to sort correctly by timestamp. Ordering by timestamp is important for the use case of UUIDs as primary keys in a database since it improves locality by grouping new records close to each other (this can have major performance implications in large tables). Peabody Expires August 13, 2020 [Page 2] Internet-Draft new-uuid-format February 2020 The proposed change is to encode the timestamp value into the same 60 bits as in [RFC4122] but in big-endian byte ordering. This way an application can sort by timestamp by simply treating the UUID as an opaque bunch of bytes. 2.3. Clock Sequence and Node Parts The latter 64 bits of a UUID per [RFC4122] are the clock sequence and node fields. The node field is problematic as it encourages applications to use their MAC address which may present a security problem (it is not always appropriate to reveal the network address of a machine as it could make it the target of an attack). A lesser concern is that it also incidentally produces UUID with the same 6 bytes at the end and are visually more difficult to distinguish when looking at them in a list. Seeing as the entire point of these last 64 bits is to ensure uniqueness, this document proposes that the strict definitions of clock sequence and node be relaxed. Instead implementations would be permitted to fill this section with random bytes and/or include an application defined value for uniqueness (such as a node number of a machine in a cluster). Note for discussion: Another point to consider is that there is no known way to fully guarantee that that duplicate identifiers will not be created unless some per-determined outside source of uniqueness is employed. (Such as for version 1 UUIDs the MAC address.) However, applications each have their own requirements for uniqueness. Uniqueness within a single database cluster for example is acceptable in many cases. A specification that forces all UUIDs to be globally unique when it is not needed might not be a good idea. Identifiers are only as universally unique as their input, so it might be better to just clearly state this and say that it's fine if UUIDs are only guaranteed to be unique within a specific context if it makes sense for that application. 2.4. Alternate Text Formats The existing UUID text format is hex encoded plus four hyphens. For many applications this is unnecessarily verbose. The same information can be encoded into significantly fewer bytes using a base 64 or base 32 alphabet. Many applications have a need to use the unique identifier of a database record in a URL (e.g. in an HTTP request either in the path or a query parameter). It can also be useful as a file name. Peabody Expires August 13, 2020 [Page 3] Internet-Draft new-uuid-format February 2020 This document proposes alternate alphabets for encoding UUIDs which are convenient for use in URLs and file names, and also sort correctly when treated as raw bytes. Some applications may not have the ability (or want) to encode and decode UUIDs from text to binary and thus having the text format also sort correctly as raw bytes is useful. The standard Base64 and Base32 specifications in [RFC4648] do not have these properties, thus different alphabets are given for each. Situations which require understanding the encoding should specify which encoding is used. For example, a database field which uses UUID version 6 with "b64a" encoding (see below), could be specified as type "UUID6B64A", which would result in binary storage according to UUID version 6, and otherwise read and write the value to/from applications in the b64a text format shown below. Note also that the length can be easily used to positively distinguish if a value is text or binary form. A 16-byte value will necessarily be raw unencoded bytes whereas text forms will be longer. 2.4.1. Base64 Text (Variant A) UUIDs encoded in this form use the "url-safe base64" alphabet: "A" to "Z", "a" to "z", "0" to "9" and "-" and "_", but in ASCII value sequence. No padding characters are used. The name "b64a" (not case sensitive) can be used by implementations to refer to this encoding. Note: It might be useful to add another variation ("b64b") with a different alphabet. Hyphen and underscore are useful in a lot of places but there might be some others that are better for specific cases. 2.4.2. Base32 Text Base32 can be useful if case-insensitivity is required. UUIDs encoded in this form use digits "2" through "7" followed by "A" through "Z" (same alphabet as in [RFC4648] but in ASCII value sequence). Case is not sensitive. Implementations choosing to output lower case letters are also correct. No padding characters are used. The name "b32a" (not case sensitive) can be used by implementations to refer to this encoding. Peabody Expires August 13, 2020 [Page 4] Internet-Draft new-uuid-format February 2020 3. Acknowledgements TBD 4. IANA Considerations TBD 5. Security Considerations TBD 6. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, July 2005, . [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, . Author's Address Brad G. Peabody Email: brad@peabody.io Peabody Expires August 13, 2020 [Page 5] ================================================ FILE: old drafts/draft-peabody-dispatch-new-uuid-format.xml ================================================ ]> UUID Format Update
brad@peabody.io
ART dispatch uuid This document presents a new UUID format (version 6) which is suited for use as a database key. This document is a proposal to update .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .
The following is a summary of proposed changes to the UUID specification in . Each is given as a statement of a problem or limitation to which it is addressed, along with a description of the proposed change.
A common case for modern applications is to need to create a unique identifier (to be used as a primary key in a database table) that is ordered by creation time, difficult to guess and has a compact text format. None of the existing UUID versions address each of these requirements. Thus a new UUID version number 6 is proposed.
The timestamp value from (60-bit number of 100- nanosecond intervals since 00:00:00.00, 15 October 1582) is workable but the sequence in which the bytes are encoded (the lowest bytes first) results in unnecessary additional logic to sort correctly by timestamp. Ordering by timestamp is important for the use case of UUIDs as primary keys in a database since it improves locality by grouping new records close to each other (this can have major performance implications in large tables). The proposed change is to encode the timestamp value into the same 60 bits as in but in big-endian byte ordering. This way an application can sort by timestamp by simply treating the UUID as an opaque bunch of bytes.
The latter 64 bits of a UUID per are the clock sequence and node fields. The node field is problematic as it encourages applications to use their MAC address which may present a security problem (it is not always appropriate to reveal the network address of a machine as it could make it the target of an attack). A lesser concern is that it also incidentally produces UUID with the same 6 bytes at the end and are visually more difficult to distinguish when looking at them in a list. Seeing as the entire point of these last 64 bits is to ensure uniqueness, this document proposes that the strict definitions of clock sequence and node be relaxed. Instead implementations would be permitted to fill this section with random bytes and/or include an application defined value for uniqueness (such as a node number of a machine in a cluster). Note for discussion: Another point to consider is that there is no known way to fully guarantee that that duplicate identifiers will not be created unless some per-determined outside source of uniqueness is employed. (Such as for version 1 UUIDs the MAC address.) However, applications each have their own requirements for uniqueness. Uniqueness within a single database cluster for example is acceptable in many cases. A specification that forces all UUIDs to be globally unique when it is not needed might not be a good idea. Identifiers are only as universally unique as their input, so it might be better to just clearly state this and say that it's fine if UUIDs are only guaranteed to be unique within a specific context if it makes sense for that application.
The existing UUID text format is hex encoded plus four hyphens. For many applications this is unnecessarily verbose. The same information can be encoded into significantly fewer bytes using a base 64 or base 32 alphabet. Many applications have a need to use the unique identifier of a database record in a URL (e.g. in an HTTP request either in the path or a query parameter). It can also be useful as a file name. This document proposes alternate alphabets for encoding UUIDs which are convenient for use in URLs and file names, and also sort correctly when treated as raw bytes. Some applications may not have the ability (or want) to encode and decode UUIDs from text to binary and thus having the text format also sort correctly as raw bytes is useful. The standard Base64 and Base32 specifications in do not have these properties, thus different alphabets are given for each. Situations which require understanding the encoding should specify which encoding is used. For example, a database field which uses UUID version 6 with "b64a" encoding (see below), could be specified as type "UUID6B64A", which would result in binary storage according to UUID version 6, and otherwise read and write the value to/from applications in the b64a text format shown below. Note also that the length can be easily used to positively distinguish if a value is text or binary form. A 16-byte value will necessarily be raw unencoded bytes whereas text forms will be longer.
UUIDs encoded in this form use the "url-safe base64" alphabet: "A" to "Z", "a" to "z", "0" to "9" and "-" and "_", but in ASCII value sequence. No padding characters are used. The name "b64a" (not case sensitive) can be used by implementations to refer to this encoding. Note: It might be useful to add another variation ("b64b") with a different alphabet. Hyphen and underscore are useful in a lot of places but there might be some others that are better for specific cases.
Base32 can be useful if case-insensitivity is required. UUIDs encoded in this form use digits "2" through "7" followed by "A" through "Z" (same alphabet as in but in ASCII value sequence). Case is not sensitive. Implementations choosing to output lower case letters are also correct. No padding characters are used. The name "b32a" (not case sensitive) can be used by implementations to refer to this encoding.
TBD
TBD
TBD
&RFC2119; &RFC4122; &RFC4648;
================================================ FILE: research/sortable-id-analysis.md ================================================ ## Analysis of current non-Standard Time-based K-sorted, Lexicographically Unique Identifiers to answer the questions: --- ### Length **Q:** Are we sticking with 128-bits or are we introducing variable-length. - The most common implementations utilize 128 bits for globally unique values. This should be kept for backwards compatibility. - Where required, to best optimize database sizes, 64 bits is the second most common and provide a level of uniqueness sufficient to a specific applications context. - Many articles exist on the general topic of "how long should an ID be to guarantee uniqueness" most agree that 128 is great, 64 is just as good if you only require uniqueness in the application context. - Larger than 128 bits is not required from a collision avoidance perspective and adds extra unneeded data to the application/database/wire/etc Sources: + https://www.slideshare.net/davegardnerisme/unique-id-generation-in-distributed-systems + https://medium.com/javascript-in-plain-english/you-might-not-need-uuid-v4-for-generating-random-identifiers-89e8a28a7d77 + https://towardsdatascience.com/are-uuids-really-unique-57eb80fc2a87 + https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/ + https://eager.io/blog/how-long-does-an-id-need-to-be/?hn ### Timestamp (and the other friends) **Q:** Should a sort order be defined, if so what. - The sort order should be up to the application and may sort slightly differently currently. - This spec should aim to provide a better sortable timestamp UUID. - UUIDv1 is technically sortable, just not in the most efficient way. - An efficient format for k-sorting/k-ordering and Lexicographically sorting should be the goal. --- **Q:** What should the format for the values be? Most agree that "unique identifiers" such as a MAC address are: - Flawed logic because they are not truly unique. - Don't work well in virtual environments. - They also pose an inherent security risk and should be avoided at all costs. A note on Security: - If security is of any concern or the UUID will be used for security operations then RFC 4122 UUIDv4 should be used. (This would be good to document in the security considerations) - The timestamp embedded within an ID does expose some data about when the corresponding data was created and the time on the current server. - However the timestamp alone articulates nothing about the data itself or the entire database or application as a whole. The current UUIDv6 draft proposes: - A 60 bit timestamp sourced from the UTC Gregorian calendar and big Indian encoding. This ultimately re-ranges the original spec to contain High, Mid, Low time values to preserve their position in time and promote better sorting ability - The clock sequence sections and node are relaxed to include random bytes. - Note that the clock sequence may be valuable to keep as per some discussion coming up later A note on the time source for the timestamp - UUIDv1 and proposed UUIDv6 utilize UTC Gregorian that are 100-nanosecond intervals. - Many other implementations utilize Unix Epoch time - Many others utilize a custom epoch date as the start - Recommendation: Relax the spec and allow for any properly synced, monotonic time source as long as the required amount of time bits can be achieved. In examining other implementations there are two outstanding approaches that can be found. 1. timestamp|random - This approach takes an input timestamp of varying length (more on this in a moment) and concatenates random data amounting to the leftover space. - The timestamp itself is usually epoch time as a standard rather than the UUIDv1 timestamp which uses Gregorian time. - The timestamp is almost always provides milliseconds of accuracy and generally is either 48 or 64 bits. - Many other timestamp bit sizes exist, for example 32, but are often used with ID implementations less than 128 bits to provide an accurate timesetamp and also leave room for random and other bits. For implementations that utilize 64 bits a smaller timestamp is acceptable. - Downside: If more than one ID is generated at the same millisecond value the "random" value acts as collision avoidance. While this is good from a collision avoidance perspective now these values are not truly sortable. - Note: Some libraries have built in "collision avoidance" where the ID generator that has created (and then detected) a duplicate ID in some way will increment a random bit, usually the least significant bit. Although the probably of a collision is low, I am not a fan of this approach which is sub-optimal for distributed environments where many nodes are creating IDs. Furthermore, What checks are in place to confirm this incremented value does not collide with another existing value somewhere on the system? 2. timestamp|random|sequence - This has all of the same notes as the first method but also adds in a sequence counter which is used to solve the problem where multiple IDs are generated at the same millisecond - The value of this sequence generally increases monotonically but the start is chosen at random. - In almost every implementation the sequence counter is a set of bits at the end of the ID. - This can be as small as 8 bits or as large as 24. The UUIDv1 uses spec uses 16 bits for clock sequence high and low which seems adequate for the collision avoidance purpose. - Question: Is the best position for sorting with this at the end of the UUID (timestamp|random|sequence) or after timestamp (timestamp|sequence|random). Benchmark testing may be needed. A third option does exist for distributed nodes that usually removes random and replaces it with a machineID 3. timestamp|machineID|sequence or timestamp|sequence|machineID - This machineID is usually a variable length from 8 to 16 bits and is locally unique to a node involved in UUID creation. - This helps for collision avoidance due to a machine's given context being captured within the UUID ensuring locally generated IDs have a very low chance of colliding with another distributed nodes IDs. - Note that this method removes random as they are also smaller than 128 bits and need to make the most of the available bits. - Again, is the best position for the sequence at the end of the UUID or after the timestamp? --- **Q:** How should this new timestamp 128-bit UUID be identified? - The Variant bits should be set to 10x (hex 8, 9, A, B) - The Version should be set to 0110 for Version 6 - If we decide to implement more than one time-sortable UUID such as a 48 and 64 bit timestamp as separate specs the next available version should be used. (7 as 0111, 8 as 1000 and so on) --- ### Text Encoding **Q:** Should we have alternate text formats or just the existing hex with dashes? - The existing hex format with the dashes is great for immediate readability by humans but terrible for computers. - They should almost never be stored as this format in database applications and instead should be stored as binary data and as such the 128 characters they represent. - It may be with an implementation note in the RFC on this very point. - Based on the data it seems like the most common are base64, base64 safe-alphabet and base32 representations. - Perhaps there is an easy way to mutate the 128 bits into any format desired? - Possible solution: A descriptor prefix on a the mutated value when these are in use. Example using the draft, b32a:base32UUID to indicate how to pack/unpack these bits. - Another Possible Solution: The last few bits can be used for identification for packing/unpacking purposes, say 0001 is b32a, 0010 is b32b, and so on - Ideally the focus is on the standard 128 bit format with the dashes and then can circle to how that may be represented as an alternative formats later. **Q:** Alternative, could we define a UUID prefix descriptor for edge-case scenarios where the current UUID format is not acceptable? - Some APIs do not play well with dashes, rather than mutate to a new format or different encoding; could a safe 'UUIDvX' prefix, where X is the current version, be used on the current UUID and dashes omitted? - This would allow for backwards compatibility with existing UUIDs while not over complicating the UUID too much. - This would also ensure pre-HTML5 IDs can utilize UUIDs as HTML4.x and lower have strict rules about an ID not starting with a number. Here they would start with a 'U' - https://www.w3.org/TR/html401/types.html#type-name Example: b3ebb6c8-19a3-11eb-adc1-0242ac120002 UUIDv1b3ebb6c819a311ebadc10242ac120002 -- 824c8abb-2774-4ee0-887c-3501501c1be1 UUIDv4824c8abb27744ee0887c3501501c1be1 --- ### Local/Global Uniqueness **Q:** Is this a solution only for globally unique IDs, or does it include locally unique IDs as well (unique within one system, not the whole planet/solar system/galaxy/universe) - For 128 bit UUID the spec should still aim to be globally unique. - If a 64bit variant alternative is introduced this should be explicitly called out that it is likely unique within a given application context. - That being said, one could explore the act of an "as required" mutations for a 64 bit variant when a situation arises where UUID is required to be transmitted outside of the application context. Potential Mutation Example 1. Compute a hash on the 64 bit variant's text representation 2. Truncate the resulting hash to 64 bits by removing the most least significant bits (Truncation process in RFC 2104, Section 5) 3. Post-fix the trimmed hash at the end of the 64bit-UUID 4. Mutate the variant bits to '111' (hex E or F) as per the leftover Variant in RFC 4122, Section 4.1.1. this is "Reserved for future definition." and is perfect for a new 64bit variant UUID. 5. The Version can be mutated to 0001 as the first version of the 64to128 conversion Variant Other Thoughts - Problem: This is a one-way conversion as bits have been mutated and there is not a good way to undo the conversion. - Alternatively, the prefix solution described could be used to identify that the 64 bit value is actually UUID 64. - Thought, this could also be used as a method to create a larger than 128 bit UUID as well using the same steps but on a 128bit UUID in step 1 and then prefixing on the current UUID behind a dash. - Input: b3ebb6c8-19a3-11eb-adc1-0242ac120002 - SHA256: 99a65514eca05bfa1ccaa95bd2510d889f1bfea20ffc2aa57a739e1bbce060c7 - Combined: b3ebb6c8-19a3-11eb-adc1-0242ac120002-99a65514eca05bfa1ccaa95bd2510d889f1bfea20ffc2aa57a739e1bbce060c7 - There may be no use case for this but if a larger more unique UUID is required this could be a potential method for creating one. The hash method could also be variable in length. sha1, md5, sha256, etc. ================================================ FILE: research/sortable-id-comparisons.md ================================================ ## A comparison of non-Standard Time-based K-sorted, Lexicographically Unique Identifiers --- ### Name: Example Formatting - Full ID Format (with all the moving parts concatenated) - Total Length (and encoding) - Timestamp Format (and format) - Node Format (extra data in the ID) - Collision Handling - Accuracy (of the timestamp) - Downsides (if there are any glaring errors) - Source (of the analysis) --- ### Name: Twitter's Cassie (LexicalUUID) - Full ID Format: timestamp|machineHostnameHash - Total Length: 128 Bits - Timestamp Format: 64 Bits, Epoch - Node Format: 64 bit hash of the machine's hostname - Accuracy: MicrosecondEpochClock - Collision Handling: n/a - Downsides: - Source: https://github.com/twitter-archive/cassie --- ### Name: Twitter's Snowflake - Full ID Format: timestamp|machineID|sequence - Total Length: 64 bits, unsigned integers - Timestamp Format: 41 bits, NTP Synced, bespoke epoch - Node Format: 10 bit configured machine id, 12 bit sequence number - Collision Handling: Explicit Sequence Number - Accuracy: Milliseconds - Downsides: - Source: https://github.com/twitter-archive/snowflake/releases/tag/snowflake-2010 --- ### Name: Boundary's Flake (inspired by Snowflake) - Full ID Format: timestamp|mac|sequence - Total Length: 128 Bits, Base62 - Timestamp Format: 64 Bits, Epoch - Node Format: 48 Bit Worker ID (MAC), 16 Bit sequence number - Accuracy: Milliseconds - Collision Handling: Explicit Sequence Number - Downsides: MAC Address in the node - Source: https://web.archive.org/web/20131231222024/http://boundary.com/blog/2012/01/12/flake-a-decentralized-k-ordered-unique-id-generator-in-erlang/, https://github.com/boundary/flake --- ### Name: Instagram's Sharding ID (inspired by Snowflake) - Full ID Format: timestamp|shardID|sequence - Total Length: 64 bits - Timestamp Format: 41 bits (epoch) - Node Format: 13 bits for shard ID, 10 bits for sequence counter - Accuracy: Milliseconds - Collision Handling: Explicit Sequence Number - Downsides: - Source: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c --- ### Name: Segment's K-Sortable Unique IDentifier (KSUID) (inspired by Snowflake) - Full ID Format: timestamp|random - Total Length: 160 Bits, base62 text format - Timestamp Format: 32 bits, UTC timestamp (epoch) - Node Format: 128 random payload - Accuracy: Seconds - Collision Handling: Random Value only - Downsides: Larger than other ID's resulting in more data per ID - Source: https://github.com/segmentio/ksuid, https://segment.com/blog/a-brief-history-of-the-uuid/ --- ### Name: Elasticsearch's Elasticflake (inspired by Flake) - Full ID Format: timestamp|mac|sequence - Total Length: 120 Bits, Base64 - Timestamp Format: 48 Bits - Node Format: 48 Bits for MAC, 24 Bits for Sequence Counter - Accuracy: Milliseconds - Collision Handling: Explicit Sequence Number - Downsides: MAC Address in the node - Source: https://github.com/ppearcy/elasticflake --- ### Name: Flake ID (Inspired by Flake?) - Full ID Format: timestamp|datacenter|worker|sequence - Total Length: 64 bits, Big Endian, - Timestamp Format: 42 Bits, Epoch - Node Format: 5 bit datacenter, 5 bit worker, 12 bit counter - Accuracy: Milliseconds - Collision Handling: Explicit Sequence Number - Downsides: - Source: https://github.com/T-PWK/flake-idgen --- ### Name: Sony's Sonyflake (inspired by Flake?) - Full ID Format: timestamp|sequence|machineID - Total Length: 63 bits? - Timestamp Format: 39 bits - Node Format: 8 bit sequence number, 16 bit machine id - Accuracy: 10 milliseconds - Collision Handling: Explicit Sequence Number - Downsides: Interesting number of total bits in the ID length - Source: https://github.com/sony/sonyflake --- ### Name: UUIDv1 - Format: timeLow|timeMid|timeHigh|version|clockSeqHigh|Variant|clockSeqLow|NodeID - Total Length: 128 characters, UUIDv1 format - Timestamp Format: 60 bit timestamp (32 bits timeLow, 16 bits timeMid, 12 bits timeHigh) - Node Format: 4 bit UUID Version, 3 Bit Variant, 5 bit clockSeqHigh, 8 bits clockSeqLow, 48 bit MAC Address - Accuracy: 100-nanosecond intervals - Collision Handling: Explicit Sequence Number (clockSeqHigh and clockSeqLow) - Downsides: Timestamp arranged in a sub-optimal way, MAC included in the calculation, Gregorian epoch + 100-nanosecond increment rather than standard epoch + sec/ms - Notes: Also used by Microsoft's NEWSEQUENTIALID (UuidCreateSequential) [Source](https://devblogs.microsoft.com/oldnewthing/20191120-00/?p=103118). Also used by Apache's Cassandra (TIMEUUID) [Source](https://cassandra.apache.org/doc/latest/cql/functions.html?highlight=timeuuid) - Source: http://www.ietf.org/rfc/rfc4122.txt --- ### Name: Laravel's Str::orderedUuid() - Full ID Format: timestamp|random (Looks like UUIDv4 format) - Total Length: 128 Bits - Timestamp Format: 48 bits, Server Time (Epoch?) - Node Format: 72 Random Bits - Accuracy: Milliseconds - Collision Handling: Random Value only - Downsides: - Source: https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8 --- ### Name: COMB GUID - Full ID Format: timestamp|random (Looks like UUIDv4 format) - Total Length: 128 Bits - Timestamp Format: 48 Bits, Epoch format - Node Format: 74 Random Bits - Accuracy: Milliseconds - Collision Handling: Random Value only - Downsides: Date may be formatted at the start or the end to accommodate SQL Sort or PostgreSQL Sort. The DateTime strategies described above are limited to 1-3ms resolution, which means if you create many COMB values per second, there is a chance you'll create two with the same timestamp value. This won't result in a database collision--the remaining random bits in the GUID protect you there. But COMBs with exactly the same timestamp value aren't guaranteed to sort in order of insertion, because once the timestamp bytes are sorted, the sort order will rely on the random bytes after that. - Source: https://github.com/richardtallent/RT.Comb --- ### Name: Universally Unique Lexicographically Sortable Identifier (ULID) - Full ID Format: timestamp|random - Total Length: 128 bits, Crockford's Base32 text format - Timestamp Format: 48 bits (Unix Time) - Node Format: 80 bits - Accuracy: Milliseconds - Collision Handling: Random Value only - Downsides: 26 character Base32 can contain 130 bits where this UUID is 128 resulting in a potential decode error on a base32 string. - Source: https://github.com/ulid/spec --- ### Name: Sortable Identifier (SID) - Full ID Format: timestamp|random - Total Length: 128 Bits, Hex, Base64, base32 - Timestamp Format: 64 bit timestamp - Node Format: 64 bit random - Accuracy: Nanoseconds - Collision Handling: If (by any chance) this is called in the same nanosecond, the random number is incremented instead of a new one being generated. - Downsides: - Source: https://github.com/chilts/sid --- ### Name: Better GUID - Full ID Format: timestamp|random - Total Length: 136 bits, base64 web-safe chars - Timestamp Format: 64 bits - Node Format: 72 bits of random - Accuracy: millisecond - Collision Handling: Monotonically incrementing "random" - Downsides: Monotonically incrementing "random", second largest of all IDs - Source: https://github.com/kjk/betterguid --- ### Name: Google's Firebase pushID - Full ID Format: timestamp|random - Total Length: 120 Bits, Modified Base64 ASCII encoded so the ascii can be sorted as well - Timestamp Format: 48 bit timestamp - Node Format: 72 bits of randomness - Accuracy: Milliseconds - Collision Handling: Built-in least significant bit increment in the event of multiple generations at the same millisecond - Downsides: Client side implementation where clock skews can be prevalent - Source: https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html, https://gist.github.com/mikelehen/3596a30bd69384624c11, https://github.com/firebase/firebase-js-sdk/blob/master/packages/database/src/core/util/NextPushId.ts --- ### Name: XID - Full ID Format: timestamp|machineID|processID|sequence - Total Length: 96 bits, Base32 - Timestamp Format: 32 bit timestamp, Unix Epoch - Node Format: 24 bit machine ID, 16 bit process ID, 24 bit counter (random start) - Accuracy: Seconds - Collision Handling: Explicit Sequence Number - Downsides: - Source: https://github.com/rs/xid --- ### Name: mongoDB's ObjectID - Full ID Format: timestamp|random|sequence - Total Length: 96 bits - Timestamp Format: 32 bit timestamp - Node Format: 40 bit random, 24 bit sequence (random start) - Accuracy: Seconds - Collision Handling: Explicit Sequence Number - Downsides: - Source: https://docs.mongodb.com/manual/reference/method/ObjectId/ --- ### Name: unique collision-resistant ID (CUID) - Format: c|timestamp|sequence|clientFingerprint|random - Total Length: ~124 bits, Base36 - Timestamp Format: ~62 bits (timestamp|sequence) - Node Format: ~62 bits (clientFingerprint|random) - Accuracy: Milliseconds - Collision Handling: Explicit Sequence Number - Downsides: math.random - Source: https://github.com/ericelliott/cuid