Repository: dgraham/json-stream
Branch: master
Commit: 6f3557ccd734
Files: 20
Total size: 71.0 KB
Directory structure:
gitextract_cy1fxiqg/
├── .github/
│ └── workflows/
│ └── ruby.yml
├── .gitignore
├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── bin/
│ ├── bundler
│ ├── console
│ ├── rake
│ └── setup
├── json-stream.gemspec
├── lib/
│ └── json/
│ ├── stream/
│ │ ├── buffer.rb
│ │ ├── builder.rb
│ │ ├── parser.rb
│ │ └── version.rb
│ └── stream.rb
└── spec/
├── buffer_spec.rb
├── builder_spec.rb
├── fixtures/
│ └── repository.json
└── parser_spec.rb
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/ruby.yml
================================================
on: [push, pull_request]
name: Build
jobs:
test:
name: rake test
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
ruby-version:
- head
- "3.3"
- "3.2"
- "3.1"
- "3.0"
- "2.7"
- "2.6"
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: ${{ matrix.ruby-version }}
bundler-cache: true
- run: |
bundle exec rake test
================================================
FILE: .gitignore
================================================
/.bundle/
/.yardoc
/Gemfile.lock
/_yardoc/
/coverage/
/doc/
/pkg/
/spec/reports/
/tmp/
*.gem
================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'
gemspec
================================================
FILE: LICENSE
================================================
Copyright (c) 2010-2024 David Graham
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
================================================
FILE: README.md
================================================
# JSON::Stream
JSON::Stream is a JSON parser, based on a finite state machine, that generates
events for each state change. This allows streaming both the JSON document into
memory and the parsed object graph out of memory to some other process.
This is much like an XML SAX parser that generates events during parsing. There
is no requirement for the document, or the object graph, to be fully buffered in
memory. This is best suited for huge JSON documents that won't fit in memory.
For example, streaming and processing large map/reduce views from Apache
CouchDB.
## Usage
The simplest way to parse is to read the full JSON document into memory
and then parse it into a full object graph. This is fine for small documents
because we have room for both the document and parsed object in memory.
```ruby
require 'json/stream'
json = File.read('/tmp/test.json')
obj = JSON::Stream::Parser.parse(json)
```
While it's possible to do this with JSON::Stream, we really want to use the json
gem for documents like this. JSON.parse() is much faster than this parser,
because it can rely on having the entire document in memory to analyze.
For larger documents we can use an IO object to stream it into the parser.
We still need room for the parsed object, but the document itself is never
fully read into memory.
```ruby
require 'json/stream'
stream = File.open('/tmp/test.json')
obj = JSON::Stream::Parser.parse(stream)
```
Again, while JSON::Stream can be used this way, if we just need to stream the
document from disk or the network, we're better off using the yajl-ruby gem.
Huge documents arriving over the network in small chunks to an EventMachine
`receive_data` loop is where JSON::Stream is really useful. Inside an
EventMachine::Connection subclass we might have:
```ruby
def post_init
@parser = JSON::Stream::Parser.new do
start_document { puts "start document" }
end_document { puts "end document" }
start_object { puts "start object" }
end_object { puts "end object" }
start_array { puts "start array" }
end_array { puts "end array" }
key { |k| puts "key: #{k}" }
value { |v| puts "value: #{v}" }
end
end
def receive_data(data)
begin
@parser << data
rescue JSON::Stream::ParserError => e
close_connection
end
end
```
The parser accepts chunks of the JSON document and parses up to the end of the
available buffer. Passing in more data resumes the parse from the prior state.
When an interesting state change happens, the parser notifies all registered
callback procs of the event.
The event callback is where we can do interesting data filtering and passing
to other processes. The above example simply prints state changes, but
imagine the callbacks looking for an array named `rows` and processing sets
of these row objects in small batches. Millions of rows, streaming over the
network, can be processed in constant memory space this way.
## Alternatives
* [json](https://github.com/flori/json)
* [yajl-ruby](https://github.com/brianmario/yajl-ruby)
* [yajl-ffi](https://github.com/dgraham/yajl-ffi)
* [application/json-seq](http://www.rfc-editor.org/rfc/rfc7464.txt)
## Development
```
$ bin/setup
$ bin/rake test
```
## License
JSON::Stream is released under the MIT license. Check the LICENSE file for details.
================================================
FILE: Rakefile
================================================
require 'rake'
require 'rake/clean'
require 'rake/testtask'
CLOBBER.include('pkg')
directory 'pkg'
desc 'Build distributable packages'
task :build => [:pkg] do
system 'gem build json-stream.gemspec && mv json-*.gem pkg/'
end
Rake::TestTask.new(:test) do |test|
test.libs << 'spec'
test.pattern = 'spec/**/*_spec.rb'
test.warning = true
end
task :default => [:clobber, :test, :build]
================================================
FILE: bin/bundler
================================================
#!/usr/bin/env ruby
# frozen_string_literal: true
#
# This file was generated by Bundler.
#
# The application 'bundler' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require "pathname"
ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
Pathname.new(__FILE__).realpath)
require "rubygems"
require "bundler/setup"
load Gem.bin_path("bundler", "bundler")
================================================
FILE: bin/console
================================================
#!/usr/bin/env ruby
require "bundler/setup"
require "json/stream"
# You can add fixtures and/or initialization code here to make experimenting
# with your gem easier. You can also use a different console, if you like.
# (If you use this, don't forget to add pry to your Gemfile!)
# require "pry"
# Pry.start
require "irb"
IRB.start(__FILE__)
================================================
FILE: bin/rake
================================================
#!/usr/bin/env ruby
# frozen_string_literal: true
#
# This file was generated by Bundler.
#
# The application 'rake' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require "pathname"
ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
Pathname.new(__FILE__).realpath)
require "rubygems"
require "bundler/setup"
load Gem.bin_path("rake", "rake")
================================================
FILE: bin/setup
================================================
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
set -vx
bundle install
================================================
FILE: json-stream.gemspec
================================================
require './lib/json/stream/version'
Gem::Specification.new do |s|
s.name = 'json-stream'
s.version = JSON::Stream::VERSION
s.summary = %q[A streaming JSON parser that generates SAX-like events.]
s.description = %q[A parser best suited for huge JSON documents that don't fit in memory.]
s.authors = ['David Graham']
s.email = %w[david.malcom.graham@gmail.com]
s.homepage = 'http://dgraham.github.io/json-stream/'
s.license = 'MIT'
s.files = Dir['[A-Z]*', 'json-stream.gemspec', '{lib}/**/*'] - ['Gemfile.lock']
s.require_path = 'lib'
s.add_development_dependency 'bundler', '~> 2.2'
s.add_development_dependency 'minitest', '~> 5.22'
s.add_development_dependency 'rake', '~> 13.2'
s.required_ruby_version = '>= 2.6.0'
end
================================================
FILE: lib/json/stream/buffer.rb
================================================
module JSON
module Stream
# A character buffer that expects a UTF-8 encoded stream of bytes.
# This handles truncated multi-byte characters properly so we can just
# feed it binary data and receive a properly formatted UTF-8 String as
# output.
#
# More UTF-8 parsing details are available at:
#
# http://en.wikipedia.org/wiki/UTF-8
# http://tools.ietf.org/html/rfc3629#section-3
class Buffer
def initialize
@state = :start
@buffer = []
@need = 0
end
# Fill the buffer with a String of binary UTF-8 encoded bytes. Returns
# as much of the data in a UTF-8 String as we have. Truncated multi-byte
# characters are saved in the buffer until the next call to this method
# where we expect to receive the rest of the multi-byte character.
#
# data - The partial binary encoded String data.
#
# Raises JSON::Stream::ParserError if the UTF-8 byte sequence is malformed.
#
# Returns a UTF-8 encoded String.
def <<(data)
# Avoid state machine for complete UTF-8.
if @buffer.empty?
data.force_encoding(Encoding::UTF_8)
return data if data.valid_encoding?
end
bytes = []
data.each_byte do |byte|
case @state
when :start
if byte < 128
bytes << byte
elsif byte >= 192
@state = :multi_byte
@buffer << byte
@need =
case
when byte >= 240 then 4
when byte >= 224 then 3
when byte >= 192 then 2
end
else
error('Expected start of multi-byte or single byte char')
end
when :multi_byte
if byte > 127 && byte < 192
@buffer << byte
if @buffer.size == @need
bytes += @buffer.slice!(0, @buffer.size)
@state = :start
end
else
error('Expected continuation byte')
end
end
end
# Build UTF-8 encoded string from completed codepoints.
bytes.pack('C*').force_encoding(Encoding::UTF_8).tap do |text|
error('Invalid UTF-8 byte sequence') unless text.valid_encoding?
end
end
# Determine if the buffer contains partial UTF-8 continuation bytes that
# are waiting on subsequent completion bytes before a full codepoint is
# formed.
#
# Examples
#
# bytes = "é".bytes
#
# buffer << bytes[0]
# buffer.empty?
# # => false
#
# buffer << bytes[1]
# buffer.empty?
# # => true
#
# Returns true if the buffer is empty.
def empty?
@buffer.empty?
end
private
def error(message)
raise ParserError, message
end
end
end
end
================================================
FILE: lib/json/stream/builder.rb
================================================
module JSON
module Stream
# A parser listener that builds a full, in memory, object from a JSON
# document. This is similar to using the json gem's `JSON.parse` method.
#
# Examples
#
# parser = JSON::Stream::Parser.new
# builder = JSON::Stream::Builder.new(parser)
# parser << '{"answer": 42, "question": false}'
# obj = builder.result
class Builder
METHODS = %w[start_document end_document start_object end_object start_array end_array key value]
attr_reader :result
def initialize(parser)
METHODS.each do |name|
parser.send(name, &method(name))
end
end
def start_document
@stack = []
@keys = []
@result = nil
end
def end_document
@result = @stack.pop
end
def start_object
@stack.push({})
end
def end_object
return if @stack.size == 1
node = @stack.pop
top = @stack[-1]
case top
when Hash
top[@keys.pop] = node
when Array
top << node
end
end
alias :end_array :end_object
def start_array
@stack.push([])
end
def key(key)
@keys << key
end
def value(value)
top = @stack[-1]
case top
when Hash
top[@keys.pop] = value
when Array
top << value
else
@stack << value
end
end
end
end
end
================================================
FILE: lib/json/stream/parser.rb
================================================
module JSON
module Stream
# Raised on any invalid JSON text.
ParserError = Class.new(RuntimeError)
# A streaming JSON parser that generates SAX-like events for state changes.
# Use the json gem for small documents. Use this for huge documents that
# won't fit in memory.
#
# Examples
#
# parser = JSON::Stream::Parser.new
# parser.key { |key| puts key }
# parser.value { |value| puts value }
# parser << '{"answer":'
# parser << ' 42}'
class Parser
BUF_SIZE = 4096
CONTROL = /[\x00-\x1F]/
WS = /[ \n\t\r]/
HEX = /[0-9a-fA-F]/
DIGIT = /[0-9]/
DIGIT_1_9 = /[1-9]/
DIGIT_END = /\d$/
TRUE_RE = /[rue]/
FALSE_RE = /[alse]/
NULL_RE = /[ul]/
TRUE_KEYWORD = 'true'
FALSE_KEYWORD = 'false'
NULL_KEYWORD = 'null'
LEFT_BRACE = '{'
RIGHT_BRACE = '}'
LEFT_BRACKET = '['
RIGHT_BRACKET = ']'
BACKSLASH = '\\'
SLASH = '/'
QUOTE = '"'
COMMA = ','
COLON = ':'
ZERO = '0'
MINUS = '-'
PLUS = '+'
POINT = '.'
EXPONENT = /[eE]/
B,F,N,R,T,U = %w[b f n r t u]
# Parses a full JSON document from a String or an IO stream and returns
# the parsed object graph. For parsing small JSON documents with small
# memory requirements, use the json gem's faster JSON.parse method instead.
#
# json - The String or IO containing JSON data.
#
# Examples
#
# JSON::Stream::Parser.parse('{"hello": "world"}')
# # => {"hello": "world"}
#
# Raises a JSON::Stream::ParserError if the JSON data is malformed.
#
# Returns a Hash.
def self.parse(json)
stream = json.is_a?(String) ? StringIO.new(json) : json
parser = Parser.new
builder = Builder.new(parser)
while (buf = stream.read(BUF_SIZE)) != nil
parser << buf
end
parser.finish
builder.result
ensure
stream.close
end
# Create a new parser with an optional initialization block where
# we can register event callbacks.
#
# Examples
#
# parser = JSON::Stream::Parser.new do
# start_document { puts "start document" }
# end_document { puts "end document" }
# start_object { puts "start object" }
# end_object { puts "end object" }
# start_array { puts "start array" }
# end_array { puts "end array" }
# key { |k| puts "key: #{k}" }
# value { |v| puts "value: #{v}" }
# end
def initialize(&block)
@state = :start_document
@utf8 = Buffer.new
@listeners = {
start_document: [],
end_document: [],
start_object: [],
end_object: [],
start_array: [],
end_array: [],
key: [],
value: []
}
# Track parse stack.
@stack = []
@unicode = ""
@buf = ""
@pos = -1
# Register any observers in the block.
instance_eval(&block) if block_given?
end
def start_document(&block)
@listeners[:start_document] << block
end
def end_document(&block)
@listeners[:end_document] << block
end
def start_object(&block)
@listeners[:start_object] << block
end
def end_object(&block)
@listeners[:end_object] << block
end
def start_array(&block)
@listeners[:start_array] << block
end
def end_array(&block)
@listeners[:end_array] << block
end
def key(&block)
@listeners[:key] << block
end
def value(&block)
@listeners[:value] << block
end
# Pass data into the parser to advance the state machine and
# generate callback events. This is well suited for an EventMachine
# receive_data loop.
#
# data - The String of partial JSON data to parse.
#
# Raises a JSON::Stream::ParserError if the JSON data is malformed.
#
# Returns nothing.
def <<(data)
(@utf8 << data).each_char do |ch|
@pos += 1
case @state
when :start_document
start_value(ch)
when :start_object
case ch
when QUOTE
@state = :start_string
@stack.push(:key)
when RIGHT_BRACE
end_container(:object)
when WS
# ignore
else
error('Expected object key start')
end
when :start_string
case ch
when QUOTE
if @stack.pop == :string
end_value(@buf)
else # :key
@state = :end_key
notify(:key, @buf)
end
@buf = ""
when BACKSLASH
@state = :start_escape
when CONTROL
error('Control characters must be escaped')
else
@buf << ch
end
when :start_escape
case ch
when QUOTE, BACKSLASH, SLASH
@buf << ch
@state = :start_string
when B
@buf << "\b"
@state = :start_string
when F
@buf << "\f"
@state = :start_string
when N
@buf << "\n"
@state = :start_string
when R
@buf << "\r"
@state = :start_string
when T
@buf << "\t"
@state = :start_string
when U
@state = :unicode_escape
else
error('Expected escaped character')
end
when :unicode_escape
case ch
when HEX
@unicode << ch
if @unicode.size == 4
codepoint = @unicode.slice!(0, 4).hex
if codepoint >= 0xD800 && codepoint <= 0xDBFF
error('Expected low surrogate pair half') if @stack[-1].is_a?(Integer)
@state = :start_surrogate_pair
@stack.push(codepoint)
elsif codepoint >= 0xDC00 && codepoint <= 0xDFFF
high = @stack.pop
error('Expected high surrogate pair half') unless high.is_a?(Integer)
pair = ((high - 0xD800) * 0x400) + (codepoint - 0xDC00) + 0x10000
@buf << pair
@state = :start_string
else
@buf << codepoint
@state = :start_string
end
end
else
error('Expected unicode escape hex digit')
end
when :start_surrogate_pair
case ch
when BACKSLASH
@state = :start_surrogate_pair_u
else
error('Expected low surrogate pair half')
end
when :start_surrogate_pair_u
case ch
when U
@state = :unicode_escape
else
error('Expected low surrogate pair half')
end
when :start_negative_number
case ch
when ZERO
@state = :start_zero
@buf << ch
when DIGIT_1_9
@state = :start_int
@buf << ch
else
error('Expected 0-9 digit')
end
when :start_zero
case ch
when POINT
@state = :start_float
@buf << ch
when EXPONENT
@state = :start_exponent
@buf << ch
else
end_value(@buf.to_i)
@buf = ""
@pos -= 1
redo
end
when :start_float
case ch
when DIGIT
@state = :in_float
@buf << ch
else
error('Expected 0-9 digit')
end
when :in_float
case ch
when DIGIT
@buf << ch
when EXPONENT
@state = :start_exponent
@buf << ch
else
end_value(@buf.to_f)
@buf = ""
@pos -= 1
redo
end
when :start_exponent
case ch
when MINUS, PLUS, DIGIT
@state = :in_exponent
@buf << ch
else
error('Expected +, -, or 0-9 digit')
end
when :in_exponent
case ch
when DIGIT
@buf << ch
else
error('Expected 0-9 digit') unless @buf =~ DIGIT_END
end_value(@buf.to_f)
@buf = ""
@pos -= 1
redo
end
when :start_int
case ch
when DIGIT
@buf << ch
when POINT
@state = :start_float
@buf << ch
when EXPONENT
@state = :start_exponent
@buf << ch
else
end_value(@buf.to_i)
@buf = ""
@pos -= 1
redo
end
when :start_true
keyword(TRUE_KEYWORD, true, TRUE_RE, ch)
when :start_false
keyword(FALSE_KEYWORD, false, FALSE_RE, ch)
when :start_null
keyword(NULL_KEYWORD, nil, NULL_RE, ch)
when :end_key
case ch
when COLON
@state = :key_sep
when WS
# ignore
else
error('Expected colon key separator')
end
when :key_sep
start_value(ch)
when :start_array
case ch
when RIGHT_BRACKET
end_container(:array)
when WS
# ignore
else
start_value(ch)
end
when :end_value
case ch
when COMMA
@state = :value_sep
when RIGHT_BRACE
end_container(:object)
when RIGHT_BRACKET
end_container(:array)
when WS
# ignore
else
error('Expected comma or object or array close')
end
when :value_sep
if @stack[-1] == :object
case ch
when QUOTE
@state = :start_string
@stack.push(:key)
when WS
# ignore
else
error('Expected object key start')
end
else
start_value(ch)
end
when :end_document
error('Unexpected data') unless ch =~ WS
end
end
end
# Drain any remaining buffered characters into the parser to complete
# the parsing of the document.
#
# This is only required when parsing a document containing a single
# numeric value, integer or float. The parser has no other way to
# detect when it should no longer expect additional characters with
# which to complete the parse, so it must be signaled by a call to
# this method.
#
# If you're parsing more typical object or array documents, there's no
# need to call `finish` because the parse will complete when the final
# closing `]` or `}` character is scanned.
#
# Raises a JSON::Stream::ParserError if the JSON data is malformed.
#
# Returns nothing.
def finish
# Partial multi-byte character waiting for completion bytes.
error('Unexpected end-of-file') unless @utf8.empty?
# Partial array, object, or string.
error('Unexpected end-of-file') unless @stack.empty?
case @state
when :end_document
# done, do nothing
when :in_float
end_value(@buf.to_f)
when :in_exponent
error('Unexpected end-of-file') unless @buf =~ DIGIT_END
end_value(@buf.to_f)
when :start_zero
end_value(@buf.to_i)
when :start_int
end_value(@buf.to_i)
else
error('Unexpected end-of-file')
end
end
private
# Invoke all registered observer procs for the event type.
#
# type - The Symbol listener name.
# args - The argument list to pass into the observer procs.
#
# Examples
#
# # broadcast events for {"answer": 42}
# notify(:start_object)
# notify(:key, "answer")
# notify(:value, 42)
# notify(:end_object)
#
# Returns nothing.
def notify(type, *args)
@listeners[type].each do |block|
block.call(*args)
end
end
# Complete an object or array container value type.
#
# type - The Symbol, :object or :array, of the expected type.
#
# Raises a JSON::Stream::ParserError if the expected container type
# was not completed.
#
# Returns nothing.
def end_container(type)
@state = :end_value
if @stack.pop == type
case type
when :object then notify(:end_object)
when :array then notify(:end_array)
end
else
error("Expected end of #{type}")
end
notify_end_document if @stack.empty?
end
# Broadcast an `end_document` event to observers after a complete JSON
# value document (object, array, number, string, true, false, null) has
# been parsed from the text. This is the final event sent to observers
# and signals the parse has finished.
#
# Returns nothing.
def notify_end_document
@state = :end_document
notify(:end_document)
end
# Parse one of the three allowed keywords: true, false, null.
#
# word - The String keyword ('true', 'false', 'null').
# value - The Ruby value (true, false, nil).
# re - The Regexp of allowed keyword characters.
# ch - The current String character being parsed.
#
# Raises a JSON::Stream::ParserError if the character does not belong
# in the expected keyword.
#
# Returns nothing.
def keyword(word, value, re, ch)
if ch =~ re
@buf << ch
else
error("Expected #{word} keyword")
end
if @buf.size == word.size
if @buf == word
@buf = ""
end_value(value)
else
error("Expected #{word} keyword")
end
end
end
# Process the first character of one of the seven possible JSON
# values: object, array, string, true, false, null, number.
#
# ch - The current character String.
#
# Raises a JSON::Stream::ParserError if the character does not signal
# the start of a value.
#
# Returns nothing.
def start_value(ch)
case ch
when LEFT_BRACE
notify(:start_document) if @stack.empty?
@state = :start_object
@stack.push(:object)
notify(:start_object)
when LEFT_BRACKET
notify(:start_document) if @stack.empty?
@state = :start_array
@stack.push(:array)
notify(:start_array)
when QUOTE
@state = :start_string
@stack.push(:string)
when T
@state = :start_true
@buf << ch
when F
@state = :start_false
@buf << ch
when N
@state = :start_null
@buf << ch
when MINUS
@state = :start_negative_number
@buf << ch
when ZERO
@state = :start_zero
@buf << ch
when DIGIT_1_9
@state = :start_int
@buf << ch
when WS
# ignore
else
error('Expected value')
end
end
# Advance the state machine and notify `value` observers that a
# string, number or keyword (true, false, null) value was parsed.
#
# value - The object to broadcast to observers.
#
# Returns nothing.
def end_value(value)
@state = :end_value
notify(:start_document) if @stack.empty?
notify(:value, value)
notify_end_document if @stack.empty?
end
def error(message)
raise ParserError, "#{message}: char #{@pos}"
end
end
end
end
================================================
FILE: lib/json/stream/version.rb
================================================
module JSON
module Stream
VERSION = '1.0.0'
end
end
================================================
FILE: lib/json/stream.rb
================================================
# encoding: UTF-8
require 'stringio'
require 'json/stream/buffer'
require 'json/stream/builder'
require 'json/stream/parser'
require 'json/stream/version'
================================================
FILE: spec/buffer_spec.rb
================================================
require 'json/stream'
require 'minitest/autorun'
describe JSON::Stream::Buffer do
subject { JSON::Stream::Buffer.new }
it 'accepts single byte characters' do
assert_equal "", subject << ""
assert_equal "abc", subject << "abc"
assert_equal "\u0000abc", subject << "\u0000abc"
end
# The é character can be a single codepoint \u00e9 or two codepoints
# \u0065\u0301. The first is encoded in 2 bytes, the second in 3 bytes.
# The json and yajl-ruby gems and CouchDB do not normalize unicode text
# so neither will we. Although, a good way to normalize is by calling
# ActiveSupport::Multibyte::Chars.new("é").normalize(:c).
it 'accepts combined characters' do
assert_equal "\u0065\u0301", subject << "\u0065\u0301"
assert_equal 3, (subject << "\u0065\u0301").bytesize
assert_equal 2, (subject << "\u0065\u0301").size
assert_equal "\u00e9", subject << "\u00e9"
assert_equal 2, (subject << "\u00e9").bytesize
assert_equal 1, (subject << "\u00e9").size
end
it 'accepts valid two byte characters' do
assert_equal "abcé", subject << "abcé"
assert_equal "a", subject << "a\xC3"
assert_equal "é", subject << "\xA9"
assert_equal "", subject << "\xC3"
assert_equal "é", subject << "\xA9"
assert_equal "é", subject << "\xC3\xA9"
end
it 'accepts valid three byte characters' do
assert_equal "abcé\u2603", subject << "abcé\u2603"
assert_equal "a", subject << "a\xE2"
assert_equal "", subject << "\x98"
assert_equal "\u2603", subject << "\x83"
end
it 'accepts valid four byte characters' do
assert_equal "abcé\u2603\u{10102}é", subject << "abcé\u2603\u{10102}é"
assert_equal "a", subject << "a\xF0"
assert_equal "", subject << "\x90"
assert_equal "", subject << "\x84"
assert_equal "\u{10102}", subject << "\x82"
end
it 'rejects valid utf-8 followed by partial two byte sequence' do
assert_equal '[', subject << '['
assert_equal '"', subject << '"'
assert_equal '', subject << "\xC3"
assert_raises(JSON::Stream::ParserError) { subject << '"' }
end
it 'rejects invalid two byte start characters' do
assert_raises(JSON::Stream::ParserError) { subject << "\xC3\xC3" }
end
it 'rejects invalid three byte start characters' do
assert_raises(JSON::Stream::ParserError) { subject << "\xE2\xE2" }
end
it 'rejects invalid four byte start characters' do
assert_raises(JSON::Stream::ParserError) { subject << "\xF0\xF0" }
end
it 'rejects a two byte start with single byte continuation character' do
assert_raises(JSON::Stream::ParserError) { subject << "\xC3\u0000" }
end
it 'rejects a three byte start with single byte continuation character' do
assert_raises(JSON::Stream::ParserError) { subject << "\xE2\u0010" }
end
it 'rejects a four byte start with single byte continuation character' do
assert_raises(JSON::Stream::ParserError) { subject << "\xF0a" }
end
it 'rejects an invalid continuation character' do
assert_raises(JSON::Stream::ParserError) { subject << "\xA9" }
end
it 'rejects an overlong form' do
assert_raises(JSON::Stream::ParserError) { subject << "\xC0\x80" }
end
describe 'checking for empty buffers' do
it 'is initially empty' do
assert subject.empty?
end
it 'is empty after processing complete characters' do
subject << 'test'
assert subject.empty?
end
it 'is not empty after processing partial multi-byte characters' do
subject << "\xC3"
refute subject.empty?
subject << "\xA9"
assert subject.empty?
end
end
end
================================================
FILE: spec/builder_spec.rb
================================================
require 'json/stream'
require 'minitest/autorun'
describe JSON::Stream::Builder do
let(:parser) { JSON::Stream::Parser.new }
subject { JSON::Stream::Builder.new(parser) }
it 'builds a false value' do
assert_nil subject.result
subject.start_document
subject.value(false)
assert_nil subject.result
subject.end_document
assert_equal false, subject.result
end
it 'builds a string value' do
assert_nil subject.result
subject.start_document
subject.value("test")
assert_nil subject.result
subject.end_document
assert_equal "test", subject.result
end
it 'builds an empty array' do
assert_nil subject.result
subject.start_document
subject.start_array
subject.end_array
assert_nil subject.result
subject.end_document
assert_equal [], subject.result
end
it 'builds an array of numbers' do
subject.start_document
subject.start_array
subject.value(1)
subject.value(2)
subject.value(3)
subject.end_array
subject.end_document
assert_equal [1, 2, 3], subject.result
end
it 'builds nested empty arrays' do
subject.start_document
subject.start_array
subject.start_array
subject.end_array
subject.end_array
subject.end_document
assert_equal [[]], subject.result
end
it 'builds nested arrays of numbers' do
subject.start_document
subject.start_array
subject.value(1)
subject.start_array
subject.value(2)
subject.end_array
subject.value(3)
subject.end_array
subject.end_document
assert_equal [1, [2], 3], subject.result
end
it 'builds an empty object' do
subject.start_document
subject.start_object
subject.end_object
subject.end_document
assert_equal({}, subject.result)
end
it 'builds a complex object' do
subject.start_document
subject.start_object
subject.key("k1")
subject.value(1)
subject.key("k2")
subject.value(nil)
subject.key("k3")
subject.value(true)
subject.key("k4")
subject.value(false)
subject.key("k5")
subject.value("string value")
subject.end_object
subject.end_document
expected = {
"k1" => 1,
"k2" => nil,
"k3" => true,
"k4" => false,
"k5" => "string value"
}
assert_equal expected, subject.result
end
it 'builds a nested object' do
subject.start_document
subject.start_object
subject.key("k1")
subject.value(1)
subject.key("k2")
subject.start_object
subject.end_object
subject.key("k3")
subject.start_object
subject.key("sub1")
subject.start_array
subject.value(12)
subject.end_array
subject.end_object
subject.key("k4")
subject.start_array
subject.value(1)
subject.start_object
subject.key("sub2")
subject.start_array
subject.value(nil)
subject.end_array
subject.end_object
subject.end_array
subject.key("k5")
subject.value("string value")
subject.end_object
subject.end_document
expected = {
"k1" => 1,
"k2" => {},
"k3" => {"sub1" => [12]},
"k4" => [1, {"sub2" => [nil]}],
"k5" => "string value"
}
assert_equal expected, subject.result
end
it 'builds a real document' do
refute_nil subject
parser << File.read('spec/fixtures/repository.json')
refute_nil subject.result
assert_equal 'rails', subject.result['name']
assert_equal 4223, subject.result['owner']['id']
assert_equal false, subject.result['fork']
assert_nil subject.result['mirror_url']
end
end
================================================
FILE: spec/fixtures/repository.json
================================================
{
"id": 8514,
"name": "rails",
"full_name": "rails/rails",
"owner": {
"login": "rails",
"id": 4223,
"avatar_url": "https://avatars.githubusercontent.com/u/4223?",
"gravatar_id": "30f39a09e233e8369dddf6feb4be0308",
"url": "https://api.github.com/users/rails",
"html_url": "https://github.com/rails",
"followers_url": "https://api.github.com/users/rails/followers",
"following_url": "https://api.github.com/users/rails/following{/other_user}",
"gists_url": "https://api.github.com/users/rails/gists{/gist_id}",
"starred_url": "https://api.github.com/users/rails/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/rails/subscriptions",
"organizations_url": "https://api.github.com/users/rails/orgs",
"repos_url": "https://api.github.com/users/rails/repos",
"events_url": "https://api.github.com/users/rails/events{/privacy}",
"received_events_url": "https://api.github.com/users/rails/received_events",
"type": "Organization",
"site_admin": false
},
"private": false,
"html_url": "https://github.com/rails/rails",
"description": "Ruby on Rails",
"fork": false,
"url": "https://api.github.com/repos/rails/rails",
"forks_url": "https://api.github.com/repos/rails/rails/forks",
"keys_url": "https://api.github.com/repos/rails/rails/keys{/key_id}",
"collaborators_url": "https://api.github.com/repos/rails/rails/collaborators{/collaborator}",
"teams_url": "https://api.github.com/repos/rails/rails/teams",
"hooks_url": "https://api.github.com/repos/rails/rails/hooks",
"issue_events_url": "https://api.github.com/repos/rails/rails/issues/events{/number}",
"events_url": "https://api.github.com/repos/rails/rails/events",
"assignees_url": "https://api.github.com/repos/rails/rails/assignees{/user}",
"branches_url": "https://api.github.com/repos/rails/rails/branches{/branch}",
"tags_url": "https://api.github.com/repos/rails/rails/tags",
"blobs_url": "https://api.github.com/repos/rails/rails/git/blobs{/sha}",
"git_tags_url": "https://api.github.com/repos/rails/rails/git/tags{/sha}",
"git_refs_url": "https://api.github.com/repos/rails/rails/git/refs{/sha}",
"trees_url": "https://api.github.com/repos/rails/rails/git/trees{/sha}",
"statuses_url": "https://api.github.com/repos/rails/rails/statuses/{sha}",
"languages_url": "https://api.github.com/repos/rails/rails/languages",
"stargazers_url": "https://api.github.com/repos/rails/rails/stargazers",
"contributors_url": "https://api.github.com/repos/rails/rails/contributors",
"subscribers_url": "https://api.github.com/repos/rails/rails/subscribers",
"subscription_url": "https://api.github.com/repos/rails/rails/subscription",
"commits_url": "https://api.github.com/repos/rails/rails/commits{/sha}",
"git_commits_url": "https://api.github.com/repos/rails/rails/git/commits{/sha}",
"comments_url": "https://api.github.com/repos/rails/rails/comments{/number}",
"issue_comment_url": "https://api.github.com/repos/rails/rails/issues/comments/{number}",
"contents_url": "https://api.github.com/repos/rails/rails/contents/{+path}",
"compare_url": "https://api.github.com/repos/rails/rails/compare/{base}...{head}",
"merges_url": "https://api.github.com/repos/rails/rails/merges",
"archive_url": "https://api.github.com/repos/rails/rails/{archive_format}{/ref}",
"downloads_url": "https://api.github.com/repos/rails/rails/downloads",
"issues_url": "https://api.github.com/repos/rails/rails/issues{/number}",
"pulls_url": "https://api.github.com/repos/rails/rails/pulls{/number}",
"milestones_url": "https://api.github.com/repos/rails/rails/milestones{/number}",
"notifications_url": "https://api.github.com/repos/rails/rails/notifications{?since,all,participating}",
"labels_url": "https://api.github.com/repos/rails/rails/labels{/name}",
"releases_url": "https://api.github.com/repos/rails/rails/releases{/id}",
"created_at": "2008-04-11T02:19:47Z",
"updated_at": "2014-06-25T21:08:45Z",
"pushed_at": "2014-06-25T17:47:52Z",
"git_url": "git://github.com/rails/rails.git",
"ssh_url": "git@github.com:rails/rails.git",
"clone_url": "https://github.com/rails/rails.git",
"svn_url": "https://github.com/rails/rails",
"homepage": "http://rubyonrails.org",
"size": 331047,
"stargazers_count": 22248,
"watchers_count": 22248,
"language": "Ruby",
"has_issues": true,
"has_downloads": true,
"has_wiki": false,
"forks_count": 8278,
"mirror_url": null,
"open_issues_count": 625,
"forks": 8278,
"open_issues": 625,
"watchers": 22248,
"default_branch": "master",
"organization": {
"login": "rails",
"id": 4223,
"avatar_url": "https://avatars.githubusercontent.com/u/4223?",
"gravatar_id": "30f39a09e233e8369dddf6feb4be0308",
"url": "https://api.github.com/users/rails",
"html_url": "https://github.com/rails",
"followers_url": "https://api.github.com/users/rails/followers",
"following_url": "https://api.github.com/users/rails/following{/other_user}",
"gists_url": "https://api.github.com/users/rails/gists{/gist_id}",
"starred_url": "https://api.github.com/users/rails/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/rails/subscriptions",
"organizations_url": "https://api.github.com/users/rails/orgs",
"repos_url": "https://api.github.com/users/rails/repos",
"events_url": "https://api.github.com/users/rails/events{/privacy}",
"received_events_url": "https://api.github.com/users/rails/received_events",
"type": "Organization",
"site_admin": false
},
"network_count": 8278,
"subscribers_count": 1521
}
================================================
FILE: spec/parser_spec.rb
================================================
require 'json/stream'
require 'minitest/autorun'
describe JSON::Stream::Parser do
subject { JSON::Stream::Parser.new }
describe 'parsing a document' do
it 'rejects documents containing bad start character' do
expected = [:error]
assert_equal expected, events('a')
end
it 'rejects documents starting with period' do
expected = [:error]
assert_equal expected, events('.')
end
it 'parses a null value document' do
expected = [:start_document, [:value, nil], :end_document]
assert_equal expected, events('null')
end
it 'parses a false value document' do
expected = [:start_document, [:value, false], :end_document]
assert_equal expected, events('false')
end
it 'parses a true value document' do
expected = [:start_document, [:value, true], :end_document]
assert_equal expected, events('true')
end
it 'parses a string document' do
expected = [:start_document, [:value, "test"], :end_document]
assert_equal expected, events('"test"')
end
it 'parses a single digit integer value document' do
expected = [:start_document, [:value, 2], :end_document]
events = events('2', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses a multiple digit integer value document' do
expected = [:start_document, [:value, 12], :end_document]
events = events('12', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses a zero literal document' do
expected = [:start_document, [:value, 0], :end_document]
events = events('0', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses a negative integer document' do
expected = [:start_document, [:value, -1], :end_document]
events = events('-1', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses an exponent literal document' do
expected = [:start_document, [:value, 200.0], :end_document]
events = events('2e2', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses a float value document' do
expected = [:start_document, [:value, 12.1], :end_document]
events = events('12.1', subject)
assert events.empty?
subject.finish
assert_equal expected, events
end
it 'parses a value document with leading whitespace' do
expected = [:start_document, [:value, false], :end_document]
assert_equal expected, events(' false ')
end
it 'parses array documents' do
expected = [:start_document, :start_array, :end_array, :end_document]
assert_equal expected, events('[]')
assert_equal expected, events('[ ]')
assert_equal expected, events(' [] ')
assert_equal expected, events(' [ ] ')
end
it 'parses object documents' do
expected = [:start_document, :start_object, :end_object, :end_document]
assert_equal expected, events('{}')
assert_equal expected, events('{ }')
assert_equal expected, events(' {} ')
assert_equal expected, events(' { } ')
end
it 'rejects documents with trailing characters' do
expected = [:start_document, :start_object, :end_object, :end_document, :error]
assert_equal expected, events('{}a')
assert_equal expected, events('{ } 12')
assert_equal expected, events(' {} false')
assert_equal expected, events(' { }, {}')
end
it 'ignores whitespace around tokens, preserves it within strings' do
json = %Q{
{ " key 1 " : \t [
1, 2, " my string ",\r
false, true, null ]
}
}
expected = [
:start_document,
:start_object,
[:key, " key 1 "],
:start_array,
[:value, 1],
[:value, 2],
[:value, " my string "],
[:value, false],
[:value, true],
[:value, nil],
:end_array,
:end_object,
:end_document
]
assert_equal expected, events(json)
end
it 'rejects form feed whitespace' do
json = "[1,\f 2]"
expected = [:start_document, :start_array, [:value, 1], :error]
assert_equal expected, events(json)
end
it 'rejects vertical tab whitespace' do
json = "[1,\v 2]"
expected = [:start_document, :start_array, [:value, 1], :error]
assert_equal expected, events(json)
end
it 'rejects partial keyword tokens' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[tru]')
assert_equal expected, events('[fal]')
assert_equal expected, events('[nul,true]')
assert_equal expected, events('[fals1]')
end
it 'rejects scrambled keyword tokens' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[ture]')
assert_equal expected, events('[fales]')
assert_equal expected, events('[nlul]')
end
it 'parses single keyword tokens' do
expected = [:start_document, :start_array, [:value, true], :end_array, :end_document]
assert_equal expected, events('[true]')
end
it 'parses keywords in series' do
expected = [:start_document, :start_array, [:value, true], [:value, nil], :end_array, :end_document]
assert_equal expected, events('[true, null]')
end
end
describe 'finishing the parse' do
it 'rejects finish with no json data provided' do
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial null keyword' do
subject << 'nul'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial true keyword' do
subject << 'tru'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial false keyword' do
subject << 'fals'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial float literal' do
subject << '42.'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial exponent' do
subject << '42e'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects malformed exponent' do
subject << '42e+'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial negative number' do
subject << '-'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial string literal' do
subject << '"test'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial object ending in literal value' do
subject << '{"test": 42'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'rejects partial array ending in literal value' do
subject << '[42'
assert_raises(JSON::Stream::ParserError) { subject.finish }
end
it 'does nothing on subsequent finish' do
begin
subject << 'false'
subject.finish
subject.finish
rescue
fail 'raised unexpected error'
end
end
end
describe 'parsing number tokens' do
it 'rejects invalid negative numbers' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[-]')
expected = [:start_document, :start_array, [:value, 1], :error]
assert_equal expected, events('[1-0]')
end
it 'parses integer zero' do
expected = [:start_document, :start_array, [:value, 0], :end_array, :end_document]
assert_equal expected, events('[0]')
assert_equal expected, events('[-0]')
end
it 'parses float zero' do
expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document]
assert_equal expected, events('[0.0]')
assert_equal expected, events('[-0.0]')
end
it 'rejects multi zero' do
expected = [:start_document, :start_array, [:value, 0], :error]
assert_equal expected, events('[00]')
assert_equal expected, events('[-00]')
end
it 'rejects integers that start with zero' do
expected = [:start_document, :start_array, [:value, 0], :error]
assert_equal expected, events('[01]')
assert_equal expected, events('[-01]')
end
it 'parses integer tokens' do
expected = [:start_document, :start_array, [:value, 1], :end_array, :end_document]
assert_equal expected, events('[1]')
expected = [:start_document, :start_array, [:value, -1], :end_array, :end_document]
assert_equal expected, events('[-1]')
expected = [:start_document, :start_array, [:value, 123], :end_array, :end_document]
assert_equal expected, events('[123]')
expected = [:start_document, :start_array, [:value, -123], :end_array, :end_document]
assert_equal expected, events('[-123]')
end
it 'parses float tokens' do
expected = [:start_document, :start_array, [:value, 1.0], :end_array, :end_document]
assert_equal expected, events('[1.0]')
assert_equal expected, events('[1.00]')
end
it 'parses negative floats' do
expected = [:start_document, :start_array, [:value, -1.0], :end_array, :end_document]
assert_equal expected, events('[-1.0]')
assert_equal expected, events('[-1.00]')
end
it 'parses multi-digit floats' do
expected = [:start_document, :start_array, [:value, 123.012], :end_array, :end_document]
assert_equal expected, events('[123.012]')
assert_equal expected, events('[123.0120]')
end
it 'parses negative multi-digit floats' do
expected = [:start_document, :start_array, [:value, -123.012], :end_array, :end_document]
assert_equal expected, events('[-123.012]')
assert_equal expected, events('[-123.0120]')
end
it 'rejects floats missing leading zero' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[.1]')
assert_equal expected, events('[-.1]')
assert_equal expected, events('[.01]')
assert_equal expected, events('[-.01]')
end
it 'rejects float missing fraction' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[.]')
assert_equal expected, events('[..]')
assert_equal expected, events('[0.]')
assert_equal expected, events('[12.]')
end
it 'parses zero with implicit positive exponent as float' do
expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document]
events = events('[0e2]')
assert_equal expected, events
assert_kind_of Float, events[2][1]
end
it 'parses zero with explicit positive exponent as float' do
expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document]
events = events('[0e+2]')
assert_equal expected, events
assert_kind_of Float, events[2][1]
end
it 'parses zero with negative exponent as float' do
expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document]
events = events('[0e-2]')
assert_equal expected, events
assert_kind_of Float, events[2][1]
end
it 'parses positive exponent integers as floats' do
expected = [:start_document, :start_array, [:value, 212.0], :end_array, :end_document]
events = events('[2.12e2]')
assert_equal expected, events('[2.12e2]')
assert_kind_of Float, events[2][1]
assert_equal expected, events('[2.12e02]')
assert_equal expected, events('[2.12e+2]')
assert_equal expected, events('[2.12e+02]')
end
it 'parses positive exponent floats' do
expected = [:start_document, :start_array, [:value, 21.2], :end_array, :end_document]
assert_equal expected, events('[2.12e1]')
assert_equal expected, events('[2.12e01]')
assert_equal expected, events('[2.12e+1]')
assert_equal expected, events('[2.12e+01]')
end
it 'parses negative exponent' do
expected = [:start_document, :start_array, [:value, 0.0212], :end_array, :end_document]
assert_equal expected, events('[2.12e-2]')
assert_equal expected, events('[2.12e-02]')
assert_equal expected, events('[2.12e-2]')
assert_equal expected, events('[2.12e-02]')
end
it 'parses zero exponent floats' do
expected = [:start_document, :start_array, [:value, 2.12], :end_array, :end_document]
assert_equal expected, events('[2.12e0]')
assert_equal expected, events('[2.12e00]')
assert_equal expected, events('[2.12e-0]')
assert_equal expected, events('[2.12e-00]')
end
it 'parses zero exponent integers' do
expected = [:start_document, :start_array, [:value, 2.0], :end_array, :end_document]
assert_equal expected, events('[2e0]')
assert_equal expected, events('[2e00]')
assert_equal expected, events('[2e-0]')
assert_equal expected, events('[2e-00]')
end
it 'rejects missing exponent' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[e]')
assert_equal expected, events('[1e]')
assert_equal expected, events('[1e-]')
assert_equal expected, events('[1e--]')
assert_equal expected, events('[1e+]')
assert_equal expected, events('[1e++]')
assert_equal expected, events('[0.e]')
assert_equal expected, events('[10.e]')
end
it 'rejects float with trailing character' do
expected = [:start_document, :start_array, [:value, 0.0], :error]
assert_equal expected, events('[0.0q]')
end
it 'rejects integer with trailing character' do
expected = [:start_document, :start_array, [:value, 1], :error]
assert_equal expected, events('[1q]')
end
end
describe 'parsing string tokens' do
describe 'parsing two-character escapes' do
it 'rejects invalid escape characters' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\\a"]')
end
it 'parses quotation mark' do
expected = [:start_document, :start_array, [:value, "\""], :end_array, :end_document]
assert_equal expected, events('["\""]')
end
it 'parses reverse solidus' do
expected = [:start_document, :start_array, [:value, "\\"], :end_array, :end_document]
assert_equal expected, events('["\\\"]')
end
it 'parses solidus' do
expected = [:start_document, :start_array, [:value, "/"], :end_array, :end_document]
assert_equal expected, events('["\/"]')
end
it 'parses backspace' do
expected = [:start_document, :start_array, [:value, "\b"], :end_array, :end_document]
assert_equal expected, events('["\b"]')
end
it 'parses form feed' do
expected = [:start_document, :start_array, [:value, "\f"], :end_array, :end_document]
assert_equal expected, events('["\f"]')
end
it 'parses line feed' do
expected = [:start_document, :start_array, [:value, "\n"], :end_array, :end_document]
assert_equal expected, events('["\n"]')
end
it 'parses carriage return' do
expected = [:start_document, :start_array, [:value, "\r"], :end_array, :end_document]
assert_equal expected, events('["\r"]')
end
it 'parses tab' do
expected = [:start_document, :start_array, [:value, "\t"], :end_array, :end_document]
assert_equal expected, events('["\t"]')
end
it 'parses a series of escapes with whitespace' do
expected = [:start_document, :start_array, [:value, "\" \\ / \b \f \n \r \t"], :end_array, :end_document]
assert_equal expected, events('["\" \\\ \/ \b \f \n \r \t"]')
end
it 'parses a series of escapes without whitespace' do
expected = [:start_document, :start_array, [:value, "\"\\/\b\f\n\r\t"], :end_array, :end_document]
assert_equal expected, events('["\"\\\\/\b\f\n\r\t"]')
end
it 'parses a series of escapes with duplicate characters between them' do
expected = [:start_document, :start_array, [:value, "\"t\\b/f\bn\f/\nn\rr\t"], :end_array, :end_document]
assert_equal expected, events('["\"t\\\b\/f\bn\f/\nn\rr\t"]')
end
end
describe 'parsing control characters' do
it 'rejects control character in array' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\" \u0000 \"]")
end
it 'rejects control character in object' do
expected = [:start_document, :start_object, :error]
assert_equal expected, events("{\" \u0000 \":12}")
end
it 'parses escaped control character' do
expected = [:start_document, :start_array, [:value, "\u0000"], :end_array, :end_document]
assert_equal expected, events('["\\u0000"]')
end
it 'parses escaped control character in object key' do
expected = [:start_document, :start_object, [:key, "\u0000"], [:value, 12], :end_object, :end_document]
assert_equal expected, events('{"\\u0000": 12}')
end
it 'parses non-control character' do
# del ascii 127 is allowed unescaped in json
expected = [:start_document, :start_array, [:value, " \u007F "], :end_array, :end_document]
assert_equal expected, events("[\" \u007f \"]")
end
end
describe 'parsing unicode escape sequences' do
it 'parses escaped ascii character' do
a = "\x61"
escaped = '\u0061'
expected = [:start_document, :start_array, [:value, a], :end_array, :end_document]
assert_equal expected, events('["' + escaped + '"]')
end
it 'parses un-escaped raw unicode' do
# U+1F602 face with tears of joy
face = "\xf0\x9f\x98\x82"
expected = [:start_document, :start_array, [:value, face], :end_array, :end_document]
assert_equal expected, events('["' + face + '"]')
end
it 'parses escaped unicode surrogate pairs' do
# U+1F602 face with tears of joy
face = "\xf0\x9f\x98\x82"
escaped = '\uD83D\uDE02'
expected = [:start_document, :start_array, [:value, face], :end_array, :end_document]
assert_equal expected, events('["' + escaped + '"]')
end
it 'rejects partial unicode escapes' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[" \\u "]')
assert_equal expected, events('[" \\u2 "]')
assert_equal expected, events('[" \\u26 "]')
assert_equal expected, events('[" \\u260 "]')
end
it 'parses unicode escapes' do
# U+2603 snowman
snowman = "\xe2\x98\x83"
escaped = '\u2603'
expected = [:start_document, :start_array, [:value, snowman], :end_array, :end_document]
assert_equal expected, events('["' + escaped + '"]')
expected = [:start_document, :start_array, [:value, 'snow' + snowman + ' man'], :end_array, :end_document]
assert_equal expected, events('["snow' + escaped + ' man"]')
expected = [:start_document, :start_array, [:value, 'snow' + snowman + '3 man'], :end_array, :end_document]
assert_equal expected, events('["snow' + escaped + '3 man"]')
expected = [:start_document, :start_object, [:key, 'snow' + snowman + '3 man'], [:value, 1], :end_object, :end_document]
assert_equal expected, events('{"snow\\u26033 man": 1}')
end
end
describe 'parsing unicode escapes with surrogate pairs' do
it 'rejects missing second pair' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\uD834"]')
end
it 'rejects missing first pair' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\uDD1E"]')
end
it 'rejects double first pair' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\uD834\uD834"]')
end
it 'rejects double second pair' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\uDD1E\uDD1E"]')
end
it 'rejects reversed pair' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('["\uDD1E\uD834"]')
end
it 'parses correct pairs in object keys and values' do
# U+1D11E G-Clef
clef = "\xf0\x9d\x84\x9e"
expected = [
:start_document,
:start_object,
[:key, clef],
[:value, "g\u{1D11E}clef"],
:end_object,
:end_document
]
assert_equal expected, events(%q{ {"\uD834\uDD1E": "g\uD834\uDD1Eclef"} })
end
end
end
describe 'parsing arrays' do
it 'rejects trailing comma' do
expected = [:start_document, :start_array, [:value, 12], :error]
assert_equal expected, events('[12, ]')
end
it 'parses nested empty array' do
expected = [:start_document, :start_array, :start_array, :end_array, :end_array, :end_document]
assert_equal expected, events('[[]]')
end
it 'parses nested array with value' do
expected = [:start_document, :start_array, :start_array, [:value, 2.1], :end_array, :end_array, :end_document]
assert_equal expected, events('[[ 2.10 ]]')
end
it 'rejects malformed arrays' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events('[}')
assert_equal expected, events('[,]')
assert_equal expected, events('[, 12]')
end
it 'rejects malformed nested arrays' do
expected = [:start_document, :start_array, :start_array, :error]
assert_equal(expected, events('[[}]'))
assert_equal expected, events('[[}]')
assert_equal expected, events('[[,]]')
end
it 'rejects malformed array value lists' do
expected = [:start_document, :start_array, [:value, "test"], :error]
assert_equal expected, events('["test"}')
assert_equal expected, events('["test",]')
assert_equal expected, events('["test" "test"]')
assert_equal expected, events('["test" 12]')
end
it 'parses array with value' do
expected = [:start_document, :start_array, [:value, "test"], :end_array, :end_document]
assert_equal expected, events('["test"]')
end
it 'parses array with value list' do
expected = [
:start_document,
:start_array,
[:value, 1],
[:value, 2],
[:value, nil],
[:value, 12.1],
[:value, "test"],
:end_array,
:end_document
]
assert_equal expected, events('[1,2, null, 12.1,"test"]')
end
end
describe 'parsing objects' do
it 'rejects malformed objects' do
expected = [:start_document, :start_object, :error]
assert_equal expected, events('{]')
assert_equal expected, events('{:}')
end
it 'parses single key object' do
expected = [:start_document, :start_object, [:key, "key 1"], [:value, 12], :end_object, :end_document]
assert_equal expected, events('{"key 1" : 12}')
end
it 'parses object key value list' do
expected = [
:start_document,
:start_object,
[:key, "key 1"], [:value, 12],
[:key, "key 2"], [:value, "two"],
:end_object,
:end_document
]
assert_equal expected, events('{"key 1" : 12, "key 2":"two"}')
end
it 'rejects object key with no value' do
expected = [
:start_document,
:start_object,
[:key, "key"],
:start_array,
[:value, nil],
[:value, false],
[:value, true],
:end_array,
[:key, "key 2"],
:error
]
assert_equal expected, events('{"key": [ null , false , true ] ,"key 2"}')
end
it 'rejects object with trailing comma' do
expected = [:start_document, :start_object, [:key, "key 1"], [:value, 12], :error]
assert_equal expected, events('{"key 1" : 12,}')
end
end
describe 'parsing unicode bytes' do
it 'parses single byte utf-8' do
expected = [:start_document, :start_array, [:value, "test"], :end_array, :end_document]
assert_equal expected, events('["test"]')
end
it 'parses full two byte utf-8' do
expected = [
:start_document,
:start_array,
[:value, "résumé"],
[:value, "éé"],
:end_array,
:end_document
]
assert_equal expected, events("[\"résumé\", \"é\xC3\xA9\"]")
end
# Parser should throw an error when only one byte of a two byte character
# is available. The \xC3 byte is the first byte of the é character.
it 'rejects a partial two byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xC3\"]")
end
it 'parses valid two byte utf-8 string' do
expected = [:start_document, :start_array, [:value, 'é'], :end_array, :end_document]
assert_equal expected, events("[\"\xC3\xA9\"]")
end
it 'parses full three byte utf-8 string' do
expected = [
:start_document,
:start_array,
[:value, "snow\u2603man"],
[:value, "\u2603\u2603"],
:end_array,
:end_document
]
assert_equal expected, events("[\"snow\u2603man\", \"\u2603\u2603\"]")
end
it 'rejects one byte of three byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xE2\"]")
end
it 'rejects two bytes of three byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xE2\x98\"]")
end
it 'parses full three byte utf-8 string' do
expected = [:start_document, :start_array, [:value, "\u2603"], :end_array, :end_document]
assert_equal expected, events("[\"\xE2\x98\x83\"]")
end
it 'parses full four byte utf-8 string' do
expected = [
:start_document,
:start_array,
[:value, "\u{10102} check mark"],
:end_array,
:end_document
]
assert_equal expected, events("[\"\u{10102} check mark\"]")
end
it 'rejects one byte of four byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xF0\"]")
end
it 'rejects two bytes of four byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xF0\x90\"]")
end
it 'rejects three bytes of four byte utf-8 string' do
expected = [:start_document, :start_array, :error]
assert_equal expected, events("[\"\xF0\x90\x84\"]")
end
it 'parses full four byte utf-8 string' do
expected = [:start_document, :start_array, [:value, "\u{10102}"], :end_array, :end_document]
assert_equal expected, events("[\"\xF0\x90\x84\x82\"]")
end
end
describe 'parsing json text from the module' do
it 'parses an array document' do
result = JSON::Stream::Parser.parse('[1,2,3]')
assert_equal [1, 2, 3], result
end
it 'parses a true keyword literal document' do
result = JSON::Stream::Parser.parse('true')
assert_equal true, result
end
it 'parses a false keyword literal document' do
result = JSON::Stream::Parser.parse('false')
assert_equal false, result
end
it 'parses a null keyword literal document' do
result = JSON::Stream::Parser.parse('null')
assert_nil result
end
it 'parses a string literal document' do
result = JSON::Stream::Parser.parse('"hello"')
assert_equal 'hello', result
end
it 'parses an integer literal document' do
result = JSON::Stream::Parser.parse('42')
assert_equal 42, result
end
it 'parses a float literal document' do
result = JSON::Stream::Parser.parse('42.12')
assert_equal 42.12, result
end
it 'rejects a partial float literal document' do
assert_raises(JSON::Stream::ParserError) do
JSON::Stream::Parser.parse('42.')
end
end
it 'rejects a partial document' do
assert_raises(JSON::Stream::ParserError) do
JSON::Stream::Parser.parse('{')
end
end
it 'rejects an empty document' do
assert_raises(JSON::Stream::ParserError) do
JSON::Stream::Parser.parse('')
end
end
end
it 'registers observers in initializer block' do
events = []
parser = JSON::Stream::Parser.new do
start_document { events << :start_document }
end_document { events << :end_document }
start_object { events << :start_object }
end_object { events << :end_object }
key { |k| events << [:key, k] }
value { |v| events << [:value, v] }
end
parser << '{"key":12}'
expected = [:start_document, :start_object, [:key, "key"], [:value, 12], :end_object, :end_document]
assert_equal expected, events
end
private
# Run a worst case, one byte at a time, parse against the JSON string and
# return a list of events generated by the parser. A special :error event is
# included if the parser threw an exception.
#
# json - The String to parse.
# parser - The optional Parser instance to use.
#
# Returns an Events instance.
def events(json, parser = nil)
parser ||= JSON::Stream::Parser.new
collector = Events.new(parser)
begin
json.each_byte { |byte| parser << [byte].pack('C') }
rescue JSON::Stream::ParserError
collector.error
end
collector.events
end
# Dynamically map methods in this class to parser callback methods
# so we can collect parser events for inspection by test cases.
class Events
METHODS = %w[start_document end_document start_object end_object start_array end_array key value]
attr_reader :events
def initialize(parser)
@events = []
METHODS.each do |name|
parser.send(name, &method(name))
end
end
METHODS.each do |name|
define_method(name) do |*args|
@events << (args.empty? ? name.to_sym : [name.to_sym, *args])
end
end
def error
@events << :error
end
end
end
gitextract_cy1fxiqg/
├── .github/
│ └── workflows/
│ └── ruby.yml
├── .gitignore
├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── bin/
│ ├── bundler
│ ├── console
│ ├── rake
│ └── setup
├── json-stream.gemspec
├── lib/
│ └── json/
│ ├── stream/
│ │ ├── buffer.rb
│ │ ├── builder.rb
│ │ ├── parser.rb
│ │ └── version.rb
│ └── stream.rb
└── spec/
├── buffer_spec.rb
├── builder_spec.rb
├── fixtures/
│ └── repository.json
└── parser_spec.rb
SYMBOL INDEX (46 symbols across 5 files)
FILE: lib/json/stream/buffer.rb
type JSON (line 1) | module JSON
type Stream (line 2) | module Stream
class Buffer (line 12) | class Buffer
method initialize (line 13) | def initialize
method << (line 29) | def <<(data)
method empty? (line 90) | def empty?
method error (line 96) | def error(message)
FILE: lib/json/stream/builder.rb
type JSON (line 1) | module JSON
type Stream (line 2) | module Stream
class Builder (line 12) | class Builder
method initialize (line 17) | def initialize(parser)
method start_document (line 23) | def start_document
method end_document (line 29) | def end_document
method start_object (line 33) | def start_object
method end_object (line 37) | def end_object
method start_array (line 52) | def start_array
method key (line 56) | def key(key)
method value (line 60) | def value(value)
FILE: lib/json/stream/parser.rb
type JSON (line 1) | module JSON
type Stream (line 2) | module Stream
class Parser (line 17) | class Parser
method parse (line 61) | def self.parse(json)
method initialize (line 89) | def initialize(&block)
method start_document (line 113) | def start_document(&block)
method end_document (line 117) | def end_document(&block)
method start_object (line 121) | def start_object(&block)
method end_object (line 125) | def end_object(&block)
method start_array (line 129) | def start_array(&block)
method end_array (line 133) | def end_array(&block)
method key (line 137) | def key(&block)
method value (line 141) | def value(&block)
method << (line 154) | def <<(data)
method finish (line 408) | def finish
method notify (line 448) | def notify(type, *args)
method end_container (line 462) | def end_container(type)
method notify_end_document (line 481) | def notify_end_document
method keyword (line 497) | def keyword(word, value, re, ch)
method start_value (line 523) | def start_value(ch)
method end_value (line 569) | def end_value(value)
method error (line 576) | def error(message)
FILE: lib/json/stream/version.rb
type JSON (line 1) | module JSON
type Stream (line 2) | module Stream
FILE: spec/parser_spec.rb
function events (line 876) | def events(json, parser = nil)
class Events (line 889) | class Events
method initialize (line 894) | def initialize(parser)
method error (line 907) | def error
Condensed preview — 20 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (77K chars).
[
{
"path": ".github/workflows/ruby.yml",
"chars": 516,
"preview": "on: [push, pull_request]\nname: Build\njobs:\n test:\n name: rake test\n runs-on: ubuntu-latest\n strategy:\n fa"
},
{
"path": ".gitignore",
"chars": 93,
"preview": "/.bundle/\n/.yardoc\n/Gemfile.lock\n/_yardoc/\n/coverage/\n/doc/\n/pkg/\n/spec/reports/\n/tmp/\n*.gem\n"
},
{
"path": "Gemfile",
"chars": 38,
"preview": "source 'https://rubygems.org'\ngemspec\n"
},
{
"path": "LICENSE",
"chars": 1061,
"preview": "Copyright (c) 2010-2024 David Graham\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof th"
},
{
"path": "README.md",
"chars": 3331,
"preview": "# JSON::Stream\n\nJSON::Stream is a JSON parser, based on a finite state machine, that generates\nevents for each state cha"
},
{
"path": "Rakefile",
"chars": 396,
"preview": "require 'rake'\nrequire 'rake/clean'\nrequire 'rake/testtask'\n\nCLOBBER.include('pkg')\n\ndirectory 'pkg'\n\ndesc 'Build distri"
},
{
"path": "bin/bundler",
"chars": 403,
"preview": "#!/usr/bin/env ruby\n# frozen_string_literal: true\n#\n# This file was generated by Bundler.\n#\n# The application 'bundler' "
},
{
"path": "bin/console",
"chars": 346,
"preview": "#!/usr/bin/env ruby\n\nrequire \"bundler/setup\"\nrequire \"json/stream\"\n\n# You can add fixtures and/or initialization code he"
},
{
"path": "bin/rake",
"chars": 394,
"preview": "#!/usr/bin/env ruby\n# frozen_string_literal: true\n#\n# This file was generated by Bundler.\n#\n# The application 'rake' is "
},
{
"path": "bin/setup",
"chars": 75,
"preview": "#!/usr/bin/env bash\n\nset -euo pipefail\nIFS=$'\\n\\t'\nset -vx\n\nbundle install\n"
},
{
"path": "json-stream.gemspec",
"chars": 797,
"preview": "require './lib/json/stream/version'\n\nGem::Specification.new do |s|\n s.name = 'json-stream'\n s.version = JSO"
},
{
"path": "lib/json/stream/buffer.rb",
"chars": 2966,
"preview": "module JSON\n module Stream\n # A character buffer that expects a UTF-8 encoded stream of bytes.\n # This handles tr"
},
{
"path": "lib/json/stream/builder.rb",
"chars": 1497,
"preview": "module JSON\n module Stream\n # A parser listener that builds a full, in memory, object from a JSON\n # document. Th"
},
{
"path": "lib/json/stream/parser.rb",
"chars": 16917,
"preview": "module JSON\n module Stream\n # Raised on any invalid JSON text.\n ParserError = Class.new(RuntimeError)\n\n # A st"
},
{
"path": "lib/json/stream/version.rb",
"chars": 60,
"preview": "module JSON\n module Stream\n VERSION = '1.0.0'\n end\nend\n"
},
{
"path": "lib/json/stream.rb",
"chars": 156,
"preview": "# encoding: UTF-8\n\nrequire 'stringio'\nrequire 'json/stream/buffer'\nrequire 'json/stream/builder'\nrequire 'json/stream/pa"
},
{
"path": "spec/buffer_spec.rb",
"chars": 3607,
"preview": "require 'json/stream'\nrequire 'minitest/autorun'\n\ndescribe JSON::Stream::Buffer do\n subject { JSON::Stream::Buffer.new "
},
{
"path": "spec/builder_spec.rb",
"chars": 3624,
"preview": "require 'json/stream'\nrequire 'minitest/autorun'\n\ndescribe JSON::Stream::Builder do\n let(:parser) { JSON::Stream::Parse"
},
{
"path": "spec/fixtures/repository.json",
"chars": 5666,
"preview": "{\n \"id\": 8514,\n \"name\": \"rails\",\n \"full_name\": \"rails/rails\",\n \"owner\": {\n \"login\": \"rails\",\n \"id\": 4223,\n "
},
{
"path": "spec/parser_spec.rb",
"chars": 30803,
"preview": "require 'json/stream'\nrequire 'minitest/autorun'\n\ndescribe JSON::Stream::Parser do\n subject { JSON::Stream::Parser.new "
}
]
About this extraction
This page contains the full source code of the dgraham/json-stream GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 20 files (71.0 KB), approximately 19.1k tokens, and a symbol index with 46 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.