Repository: dgraham/json-stream Branch: master Commit: 6f3557ccd734 Files: 20 Total size: 71.0 KB Directory structure: gitextract_cy1fxiqg/ ├── .github/ │ └── workflows/ │ └── ruby.yml ├── .gitignore ├── Gemfile ├── LICENSE ├── README.md ├── Rakefile ├── bin/ │ ├── bundler │ ├── console │ ├── rake │ └── setup ├── json-stream.gemspec ├── lib/ │ └── json/ │ ├── stream/ │ │ ├── buffer.rb │ │ ├── builder.rb │ │ ├── parser.rb │ │ └── version.rb │ └── stream.rb └── spec/ ├── buffer_spec.rb ├── builder_spec.rb ├── fixtures/ │ └── repository.json └── parser_spec.rb ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/ruby.yml ================================================ on: [push, pull_request] name: Build jobs: test: name: rake test runs-on: ubuntu-latest strategy: fail-fast: false matrix: ruby-version: - head - "3.3" - "3.2" - "3.1" - "3.0" - "2.7" - "2.6" steps: - uses: actions/checkout@v4 - uses: ruby/setup-ruby@v1 with: ruby-version: ${{ matrix.ruby-version }} bundler-cache: true - run: | bundle exec rake test ================================================ FILE: .gitignore ================================================ /.bundle/ /.yardoc /Gemfile.lock /_yardoc/ /coverage/ /doc/ /pkg/ /spec/reports/ /tmp/ *.gem ================================================ FILE: Gemfile ================================================ source 'https://rubygems.org' gemspec ================================================ FILE: LICENSE ================================================ Copyright (c) 2010-2024 David Graham Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # JSON::Stream JSON::Stream is a JSON parser, based on a finite state machine, that generates events for each state change. This allows streaming both the JSON document into memory and the parsed object graph out of memory to some other process. This is much like an XML SAX parser that generates events during parsing. There is no requirement for the document, or the object graph, to be fully buffered in memory. This is best suited for huge JSON documents that won't fit in memory. For example, streaming and processing large map/reduce views from Apache CouchDB. ## Usage The simplest way to parse is to read the full JSON document into memory and then parse it into a full object graph. This is fine for small documents because we have room for both the document and parsed object in memory. ```ruby require 'json/stream' json = File.read('/tmp/test.json') obj = JSON::Stream::Parser.parse(json) ``` While it's possible to do this with JSON::Stream, we really want to use the json gem for documents like this. JSON.parse() is much faster than this parser, because it can rely on having the entire document in memory to analyze. For larger documents we can use an IO object to stream it into the parser. We still need room for the parsed object, but the document itself is never fully read into memory. ```ruby require 'json/stream' stream = File.open('/tmp/test.json') obj = JSON::Stream::Parser.parse(stream) ``` Again, while JSON::Stream can be used this way, if we just need to stream the document from disk or the network, we're better off using the yajl-ruby gem. Huge documents arriving over the network in small chunks to an EventMachine `receive_data` loop is where JSON::Stream is really useful. Inside an EventMachine::Connection subclass we might have: ```ruby def post_init @parser = JSON::Stream::Parser.new do start_document { puts "start document" } end_document { puts "end document" } start_object { puts "start object" } end_object { puts "end object" } start_array { puts "start array" } end_array { puts "end array" } key { |k| puts "key: #{k}" } value { |v| puts "value: #{v}" } end end def receive_data(data) begin @parser << data rescue JSON::Stream::ParserError => e close_connection end end ``` The parser accepts chunks of the JSON document and parses up to the end of the available buffer. Passing in more data resumes the parse from the prior state. When an interesting state change happens, the parser notifies all registered callback procs of the event. The event callback is where we can do interesting data filtering and passing to other processes. The above example simply prints state changes, but imagine the callbacks looking for an array named `rows` and processing sets of these row objects in small batches. Millions of rows, streaming over the network, can be processed in constant memory space this way. ## Alternatives * [json](https://github.com/flori/json) * [yajl-ruby](https://github.com/brianmario/yajl-ruby) * [yajl-ffi](https://github.com/dgraham/yajl-ffi) * [application/json-seq](http://www.rfc-editor.org/rfc/rfc7464.txt) ## Development ``` $ bin/setup $ bin/rake test ``` ## License JSON::Stream is released under the MIT license. Check the LICENSE file for details. ================================================ FILE: Rakefile ================================================ require 'rake' require 'rake/clean' require 'rake/testtask' CLOBBER.include('pkg') directory 'pkg' desc 'Build distributable packages' task :build => [:pkg] do system 'gem build json-stream.gemspec && mv json-*.gem pkg/' end Rake::TestTask.new(:test) do |test| test.libs << 'spec' test.pattern = 'spec/**/*_spec.rb' test.warning = true end task :default => [:clobber, :test, :build] ================================================ FILE: bin/bundler ================================================ #!/usr/bin/env ruby # frozen_string_literal: true # # This file was generated by Bundler. # # The application 'bundler' is installed as part of a gem, and # this file is here to facilitate running it. # require "pathname" ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile", Pathname.new(__FILE__).realpath) require "rubygems" require "bundler/setup" load Gem.bin_path("bundler", "bundler") ================================================ FILE: bin/console ================================================ #!/usr/bin/env ruby require "bundler/setup" require "json/stream" # You can add fixtures and/or initialization code here to make experimenting # with your gem easier. You can also use a different console, if you like. # (If you use this, don't forget to add pry to your Gemfile!) # require "pry" # Pry.start require "irb" IRB.start(__FILE__) ================================================ FILE: bin/rake ================================================ #!/usr/bin/env ruby # frozen_string_literal: true # # This file was generated by Bundler. # # The application 'rake' is installed as part of a gem, and # this file is here to facilitate running it. # require "pathname" ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile", Pathname.new(__FILE__).realpath) require "rubygems" require "bundler/setup" load Gem.bin_path("rake", "rake") ================================================ FILE: bin/setup ================================================ #!/usr/bin/env bash set -euo pipefail IFS=$'\n\t' set -vx bundle install ================================================ FILE: json-stream.gemspec ================================================ require './lib/json/stream/version' Gem::Specification.new do |s| s.name = 'json-stream' s.version = JSON::Stream::VERSION s.summary = %q[A streaming JSON parser that generates SAX-like events.] s.description = %q[A parser best suited for huge JSON documents that don't fit in memory.] s.authors = ['David Graham'] s.email = %w[david.malcom.graham@gmail.com] s.homepage = 'http://dgraham.github.io/json-stream/' s.license = 'MIT' s.files = Dir['[A-Z]*', 'json-stream.gemspec', '{lib}/**/*'] - ['Gemfile.lock'] s.require_path = 'lib' s.add_development_dependency 'bundler', '~> 2.2' s.add_development_dependency 'minitest', '~> 5.22' s.add_development_dependency 'rake', '~> 13.2' s.required_ruby_version = '>= 2.6.0' end ================================================ FILE: lib/json/stream/buffer.rb ================================================ module JSON module Stream # A character buffer that expects a UTF-8 encoded stream of bytes. # This handles truncated multi-byte characters properly so we can just # feed it binary data and receive a properly formatted UTF-8 String as # output. # # More UTF-8 parsing details are available at: # # http://en.wikipedia.org/wiki/UTF-8 # http://tools.ietf.org/html/rfc3629#section-3 class Buffer def initialize @state = :start @buffer = [] @need = 0 end # Fill the buffer with a String of binary UTF-8 encoded bytes. Returns # as much of the data in a UTF-8 String as we have. Truncated multi-byte # characters are saved in the buffer until the next call to this method # where we expect to receive the rest of the multi-byte character. # # data - The partial binary encoded String data. # # Raises JSON::Stream::ParserError if the UTF-8 byte sequence is malformed. # # Returns a UTF-8 encoded String. def <<(data) # Avoid state machine for complete UTF-8. if @buffer.empty? data.force_encoding(Encoding::UTF_8) return data if data.valid_encoding? end bytes = [] data.each_byte do |byte| case @state when :start if byte < 128 bytes << byte elsif byte >= 192 @state = :multi_byte @buffer << byte @need = case when byte >= 240 then 4 when byte >= 224 then 3 when byte >= 192 then 2 end else error('Expected start of multi-byte or single byte char') end when :multi_byte if byte > 127 && byte < 192 @buffer << byte if @buffer.size == @need bytes += @buffer.slice!(0, @buffer.size) @state = :start end else error('Expected continuation byte') end end end # Build UTF-8 encoded string from completed codepoints. bytes.pack('C*').force_encoding(Encoding::UTF_8).tap do |text| error('Invalid UTF-8 byte sequence') unless text.valid_encoding? end end # Determine if the buffer contains partial UTF-8 continuation bytes that # are waiting on subsequent completion bytes before a full codepoint is # formed. # # Examples # # bytes = "é".bytes # # buffer << bytes[0] # buffer.empty? # # => false # # buffer << bytes[1] # buffer.empty? # # => true # # Returns true if the buffer is empty. def empty? @buffer.empty? end private def error(message) raise ParserError, message end end end end ================================================ FILE: lib/json/stream/builder.rb ================================================ module JSON module Stream # A parser listener that builds a full, in memory, object from a JSON # document. This is similar to using the json gem's `JSON.parse` method. # # Examples # # parser = JSON::Stream::Parser.new # builder = JSON::Stream::Builder.new(parser) # parser << '{"answer": 42, "question": false}' # obj = builder.result class Builder METHODS = %w[start_document end_document start_object end_object start_array end_array key value] attr_reader :result def initialize(parser) METHODS.each do |name| parser.send(name, &method(name)) end end def start_document @stack = [] @keys = [] @result = nil end def end_document @result = @stack.pop end def start_object @stack.push({}) end def end_object return if @stack.size == 1 node = @stack.pop top = @stack[-1] case top when Hash top[@keys.pop] = node when Array top << node end end alias :end_array :end_object def start_array @stack.push([]) end def key(key) @keys << key end def value(value) top = @stack[-1] case top when Hash top[@keys.pop] = value when Array top << value else @stack << value end end end end end ================================================ FILE: lib/json/stream/parser.rb ================================================ module JSON module Stream # Raised on any invalid JSON text. ParserError = Class.new(RuntimeError) # A streaming JSON parser that generates SAX-like events for state changes. # Use the json gem for small documents. Use this for huge documents that # won't fit in memory. # # Examples # # parser = JSON::Stream::Parser.new # parser.key { |key| puts key } # parser.value { |value| puts value } # parser << '{"answer":' # parser << ' 42}' class Parser BUF_SIZE = 4096 CONTROL = /[\x00-\x1F]/ WS = /[ \n\t\r]/ HEX = /[0-9a-fA-F]/ DIGIT = /[0-9]/ DIGIT_1_9 = /[1-9]/ DIGIT_END = /\d$/ TRUE_RE = /[rue]/ FALSE_RE = /[alse]/ NULL_RE = /[ul]/ TRUE_KEYWORD = 'true' FALSE_KEYWORD = 'false' NULL_KEYWORD = 'null' LEFT_BRACE = '{' RIGHT_BRACE = '}' LEFT_BRACKET = '[' RIGHT_BRACKET = ']' BACKSLASH = '\\' SLASH = '/' QUOTE = '"' COMMA = ',' COLON = ':' ZERO = '0' MINUS = '-' PLUS = '+' POINT = '.' EXPONENT = /[eE]/ B,F,N,R,T,U = %w[b f n r t u] # Parses a full JSON document from a String or an IO stream and returns # the parsed object graph. For parsing small JSON documents with small # memory requirements, use the json gem's faster JSON.parse method instead. # # json - The String or IO containing JSON data. # # Examples # # JSON::Stream::Parser.parse('{"hello": "world"}') # # => {"hello": "world"} # # Raises a JSON::Stream::ParserError if the JSON data is malformed. # # Returns a Hash. def self.parse(json) stream = json.is_a?(String) ? StringIO.new(json) : json parser = Parser.new builder = Builder.new(parser) while (buf = stream.read(BUF_SIZE)) != nil parser << buf end parser.finish builder.result ensure stream.close end # Create a new parser with an optional initialization block where # we can register event callbacks. # # Examples # # parser = JSON::Stream::Parser.new do # start_document { puts "start document" } # end_document { puts "end document" } # start_object { puts "start object" } # end_object { puts "end object" } # start_array { puts "start array" } # end_array { puts "end array" } # key { |k| puts "key: #{k}" } # value { |v| puts "value: #{v}" } # end def initialize(&block) @state = :start_document @utf8 = Buffer.new @listeners = { start_document: [], end_document: [], start_object: [], end_object: [], start_array: [], end_array: [], key: [], value: [] } # Track parse stack. @stack = [] @unicode = "" @buf = "" @pos = -1 # Register any observers in the block. instance_eval(&block) if block_given? end def start_document(&block) @listeners[:start_document] << block end def end_document(&block) @listeners[:end_document] << block end def start_object(&block) @listeners[:start_object] << block end def end_object(&block) @listeners[:end_object] << block end def start_array(&block) @listeners[:start_array] << block end def end_array(&block) @listeners[:end_array] << block end def key(&block) @listeners[:key] << block end def value(&block) @listeners[:value] << block end # Pass data into the parser to advance the state machine and # generate callback events. This is well suited for an EventMachine # receive_data loop. # # data - The String of partial JSON data to parse. # # Raises a JSON::Stream::ParserError if the JSON data is malformed. # # Returns nothing. def <<(data) (@utf8 << data).each_char do |ch| @pos += 1 case @state when :start_document start_value(ch) when :start_object case ch when QUOTE @state = :start_string @stack.push(:key) when RIGHT_BRACE end_container(:object) when WS # ignore else error('Expected object key start') end when :start_string case ch when QUOTE if @stack.pop == :string end_value(@buf) else # :key @state = :end_key notify(:key, @buf) end @buf = "" when BACKSLASH @state = :start_escape when CONTROL error('Control characters must be escaped') else @buf << ch end when :start_escape case ch when QUOTE, BACKSLASH, SLASH @buf << ch @state = :start_string when B @buf << "\b" @state = :start_string when F @buf << "\f" @state = :start_string when N @buf << "\n" @state = :start_string when R @buf << "\r" @state = :start_string when T @buf << "\t" @state = :start_string when U @state = :unicode_escape else error('Expected escaped character') end when :unicode_escape case ch when HEX @unicode << ch if @unicode.size == 4 codepoint = @unicode.slice!(0, 4).hex if codepoint >= 0xD800 && codepoint <= 0xDBFF error('Expected low surrogate pair half') if @stack[-1].is_a?(Integer) @state = :start_surrogate_pair @stack.push(codepoint) elsif codepoint >= 0xDC00 && codepoint <= 0xDFFF high = @stack.pop error('Expected high surrogate pair half') unless high.is_a?(Integer) pair = ((high - 0xD800) * 0x400) + (codepoint - 0xDC00) + 0x10000 @buf << pair @state = :start_string else @buf << codepoint @state = :start_string end end else error('Expected unicode escape hex digit') end when :start_surrogate_pair case ch when BACKSLASH @state = :start_surrogate_pair_u else error('Expected low surrogate pair half') end when :start_surrogate_pair_u case ch when U @state = :unicode_escape else error('Expected low surrogate pair half') end when :start_negative_number case ch when ZERO @state = :start_zero @buf << ch when DIGIT_1_9 @state = :start_int @buf << ch else error('Expected 0-9 digit') end when :start_zero case ch when POINT @state = :start_float @buf << ch when EXPONENT @state = :start_exponent @buf << ch else end_value(@buf.to_i) @buf = "" @pos -= 1 redo end when :start_float case ch when DIGIT @state = :in_float @buf << ch else error('Expected 0-9 digit') end when :in_float case ch when DIGIT @buf << ch when EXPONENT @state = :start_exponent @buf << ch else end_value(@buf.to_f) @buf = "" @pos -= 1 redo end when :start_exponent case ch when MINUS, PLUS, DIGIT @state = :in_exponent @buf << ch else error('Expected +, -, or 0-9 digit') end when :in_exponent case ch when DIGIT @buf << ch else error('Expected 0-9 digit') unless @buf =~ DIGIT_END end_value(@buf.to_f) @buf = "" @pos -= 1 redo end when :start_int case ch when DIGIT @buf << ch when POINT @state = :start_float @buf << ch when EXPONENT @state = :start_exponent @buf << ch else end_value(@buf.to_i) @buf = "" @pos -= 1 redo end when :start_true keyword(TRUE_KEYWORD, true, TRUE_RE, ch) when :start_false keyword(FALSE_KEYWORD, false, FALSE_RE, ch) when :start_null keyword(NULL_KEYWORD, nil, NULL_RE, ch) when :end_key case ch when COLON @state = :key_sep when WS # ignore else error('Expected colon key separator') end when :key_sep start_value(ch) when :start_array case ch when RIGHT_BRACKET end_container(:array) when WS # ignore else start_value(ch) end when :end_value case ch when COMMA @state = :value_sep when RIGHT_BRACE end_container(:object) when RIGHT_BRACKET end_container(:array) when WS # ignore else error('Expected comma or object or array close') end when :value_sep if @stack[-1] == :object case ch when QUOTE @state = :start_string @stack.push(:key) when WS # ignore else error('Expected object key start') end else start_value(ch) end when :end_document error('Unexpected data') unless ch =~ WS end end end # Drain any remaining buffered characters into the parser to complete # the parsing of the document. # # This is only required when parsing a document containing a single # numeric value, integer or float. The parser has no other way to # detect when it should no longer expect additional characters with # which to complete the parse, so it must be signaled by a call to # this method. # # If you're parsing more typical object or array documents, there's no # need to call `finish` because the parse will complete when the final # closing `]` or `}` character is scanned. # # Raises a JSON::Stream::ParserError if the JSON data is malformed. # # Returns nothing. def finish # Partial multi-byte character waiting for completion bytes. error('Unexpected end-of-file') unless @utf8.empty? # Partial array, object, or string. error('Unexpected end-of-file') unless @stack.empty? case @state when :end_document # done, do nothing when :in_float end_value(@buf.to_f) when :in_exponent error('Unexpected end-of-file') unless @buf =~ DIGIT_END end_value(@buf.to_f) when :start_zero end_value(@buf.to_i) when :start_int end_value(@buf.to_i) else error('Unexpected end-of-file') end end private # Invoke all registered observer procs for the event type. # # type - The Symbol listener name. # args - The argument list to pass into the observer procs. # # Examples # # # broadcast events for {"answer": 42} # notify(:start_object) # notify(:key, "answer") # notify(:value, 42) # notify(:end_object) # # Returns nothing. def notify(type, *args) @listeners[type].each do |block| block.call(*args) end end # Complete an object or array container value type. # # type - The Symbol, :object or :array, of the expected type. # # Raises a JSON::Stream::ParserError if the expected container type # was not completed. # # Returns nothing. def end_container(type) @state = :end_value if @stack.pop == type case type when :object then notify(:end_object) when :array then notify(:end_array) end else error("Expected end of #{type}") end notify_end_document if @stack.empty? end # Broadcast an `end_document` event to observers after a complete JSON # value document (object, array, number, string, true, false, null) has # been parsed from the text. This is the final event sent to observers # and signals the parse has finished. # # Returns nothing. def notify_end_document @state = :end_document notify(:end_document) end # Parse one of the three allowed keywords: true, false, null. # # word - The String keyword ('true', 'false', 'null'). # value - The Ruby value (true, false, nil). # re - The Regexp of allowed keyword characters. # ch - The current String character being parsed. # # Raises a JSON::Stream::ParserError if the character does not belong # in the expected keyword. # # Returns nothing. def keyword(word, value, re, ch) if ch =~ re @buf << ch else error("Expected #{word} keyword") end if @buf.size == word.size if @buf == word @buf = "" end_value(value) else error("Expected #{word} keyword") end end end # Process the first character of one of the seven possible JSON # values: object, array, string, true, false, null, number. # # ch - The current character String. # # Raises a JSON::Stream::ParserError if the character does not signal # the start of a value. # # Returns nothing. def start_value(ch) case ch when LEFT_BRACE notify(:start_document) if @stack.empty? @state = :start_object @stack.push(:object) notify(:start_object) when LEFT_BRACKET notify(:start_document) if @stack.empty? @state = :start_array @stack.push(:array) notify(:start_array) when QUOTE @state = :start_string @stack.push(:string) when T @state = :start_true @buf << ch when F @state = :start_false @buf << ch when N @state = :start_null @buf << ch when MINUS @state = :start_negative_number @buf << ch when ZERO @state = :start_zero @buf << ch when DIGIT_1_9 @state = :start_int @buf << ch when WS # ignore else error('Expected value') end end # Advance the state machine and notify `value` observers that a # string, number or keyword (true, false, null) value was parsed. # # value - The object to broadcast to observers. # # Returns nothing. def end_value(value) @state = :end_value notify(:start_document) if @stack.empty? notify(:value, value) notify_end_document if @stack.empty? end def error(message) raise ParserError, "#{message}: char #{@pos}" end end end end ================================================ FILE: lib/json/stream/version.rb ================================================ module JSON module Stream VERSION = '1.0.0' end end ================================================ FILE: lib/json/stream.rb ================================================ # encoding: UTF-8 require 'stringio' require 'json/stream/buffer' require 'json/stream/builder' require 'json/stream/parser' require 'json/stream/version' ================================================ FILE: spec/buffer_spec.rb ================================================ require 'json/stream' require 'minitest/autorun' describe JSON::Stream::Buffer do subject { JSON::Stream::Buffer.new } it 'accepts single byte characters' do assert_equal "", subject << "" assert_equal "abc", subject << "abc" assert_equal "\u0000abc", subject << "\u0000abc" end # The é character can be a single codepoint \u00e9 or two codepoints # \u0065\u0301. The first is encoded in 2 bytes, the second in 3 bytes. # The json and yajl-ruby gems and CouchDB do not normalize unicode text # so neither will we. Although, a good way to normalize is by calling # ActiveSupport::Multibyte::Chars.new("é").normalize(:c). it 'accepts combined characters' do assert_equal "\u0065\u0301", subject << "\u0065\u0301" assert_equal 3, (subject << "\u0065\u0301").bytesize assert_equal 2, (subject << "\u0065\u0301").size assert_equal "\u00e9", subject << "\u00e9" assert_equal 2, (subject << "\u00e9").bytesize assert_equal 1, (subject << "\u00e9").size end it 'accepts valid two byte characters' do assert_equal "abcé", subject << "abcé" assert_equal "a", subject << "a\xC3" assert_equal "é", subject << "\xA9" assert_equal "", subject << "\xC3" assert_equal "é", subject << "\xA9" assert_equal "é", subject << "\xC3\xA9" end it 'accepts valid three byte characters' do assert_equal "abcé\u2603", subject << "abcé\u2603" assert_equal "a", subject << "a\xE2" assert_equal "", subject << "\x98" assert_equal "\u2603", subject << "\x83" end it 'accepts valid four byte characters' do assert_equal "abcé\u2603\u{10102}é", subject << "abcé\u2603\u{10102}é" assert_equal "a", subject << "a\xF0" assert_equal "", subject << "\x90" assert_equal "", subject << "\x84" assert_equal "\u{10102}", subject << "\x82" end it 'rejects valid utf-8 followed by partial two byte sequence' do assert_equal '[', subject << '[' assert_equal '"', subject << '"' assert_equal '', subject << "\xC3" assert_raises(JSON::Stream::ParserError) { subject << '"' } end it 'rejects invalid two byte start characters' do assert_raises(JSON::Stream::ParserError) { subject << "\xC3\xC3" } end it 'rejects invalid three byte start characters' do assert_raises(JSON::Stream::ParserError) { subject << "\xE2\xE2" } end it 'rejects invalid four byte start characters' do assert_raises(JSON::Stream::ParserError) { subject << "\xF0\xF0" } end it 'rejects a two byte start with single byte continuation character' do assert_raises(JSON::Stream::ParserError) { subject << "\xC3\u0000" } end it 'rejects a three byte start with single byte continuation character' do assert_raises(JSON::Stream::ParserError) { subject << "\xE2\u0010" } end it 'rejects a four byte start with single byte continuation character' do assert_raises(JSON::Stream::ParserError) { subject << "\xF0a" } end it 'rejects an invalid continuation character' do assert_raises(JSON::Stream::ParserError) { subject << "\xA9" } end it 'rejects an overlong form' do assert_raises(JSON::Stream::ParserError) { subject << "\xC0\x80" } end describe 'checking for empty buffers' do it 'is initially empty' do assert subject.empty? end it 'is empty after processing complete characters' do subject << 'test' assert subject.empty? end it 'is not empty after processing partial multi-byte characters' do subject << "\xC3" refute subject.empty? subject << "\xA9" assert subject.empty? end end end ================================================ FILE: spec/builder_spec.rb ================================================ require 'json/stream' require 'minitest/autorun' describe JSON::Stream::Builder do let(:parser) { JSON::Stream::Parser.new } subject { JSON::Stream::Builder.new(parser) } it 'builds a false value' do assert_nil subject.result subject.start_document subject.value(false) assert_nil subject.result subject.end_document assert_equal false, subject.result end it 'builds a string value' do assert_nil subject.result subject.start_document subject.value("test") assert_nil subject.result subject.end_document assert_equal "test", subject.result end it 'builds an empty array' do assert_nil subject.result subject.start_document subject.start_array subject.end_array assert_nil subject.result subject.end_document assert_equal [], subject.result end it 'builds an array of numbers' do subject.start_document subject.start_array subject.value(1) subject.value(2) subject.value(3) subject.end_array subject.end_document assert_equal [1, 2, 3], subject.result end it 'builds nested empty arrays' do subject.start_document subject.start_array subject.start_array subject.end_array subject.end_array subject.end_document assert_equal [[]], subject.result end it 'builds nested arrays of numbers' do subject.start_document subject.start_array subject.value(1) subject.start_array subject.value(2) subject.end_array subject.value(3) subject.end_array subject.end_document assert_equal [1, [2], 3], subject.result end it 'builds an empty object' do subject.start_document subject.start_object subject.end_object subject.end_document assert_equal({}, subject.result) end it 'builds a complex object' do subject.start_document subject.start_object subject.key("k1") subject.value(1) subject.key("k2") subject.value(nil) subject.key("k3") subject.value(true) subject.key("k4") subject.value(false) subject.key("k5") subject.value("string value") subject.end_object subject.end_document expected = { "k1" => 1, "k2" => nil, "k3" => true, "k4" => false, "k5" => "string value" } assert_equal expected, subject.result end it 'builds a nested object' do subject.start_document subject.start_object subject.key("k1") subject.value(1) subject.key("k2") subject.start_object subject.end_object subject.key("k3") subject.start_object subject.key("sub1") subject.start_array subject.value(12) subject.end_array subject.end_object subject.key("k4") subject.start_array subject.value(1) subject.start_object subject.key("sub2") subject.start_array subject.value(nil) subject.end_array subject.end_object subject.end_array subject.key("k5") subject.value("string value") subject.end_object subject.end_document expected = { "k1" => 1, "k2" => {}, "k3" => {"sub1" => [12]}, "k4" => [1, {"sub2" => [nil]}], "k5" => "string value" } assert_equal expected, subject.result end it 'builds a real document' do refute_nil subject parser << File.read('spec/fixtures/repository.json') refute_nil subject.result assert_equal 'rails', subject.result['name'] assert_equal 4223, subject.result['owner']['id'] assert_equal false, subject.result['fork'] assert_nil subject.result['mirror_url'] end end ================================================ FILE: spec/fixtures/repository.json ================================================ { "id": 8514, "name": "rails", "full_name": "rails/rails", "owner": { "login": "rails", "id": 4223, "avatar_url": "https://avatars.githubusercontent.com/u/4223?", "gravatar_id": "30f39a09e233e8369dddf6feb4be0308", "url": "https://api.github.com/users/rails", "html_url": "https://github.com/rails", "followers_url": "https://api.github.com/users/rails/followers", "following_url": "https://api.github.com/users/rails/following{/other_user}", "gists_url": "https://api.github.com/users/rails/gists{/gist_id}", "starred_url": "https://api.github.com/users/rails/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rails/subscriptions", "organizations_url": "https://api.github.com/users/rails/orgs", "repos_url": "https://api.github.com/users/rails/repos", "events_url": "https://api.github.com/users/rails/events{/privacy}", "received_events_url": "https://api.github.com/users/rails/received_events", "type": "Organization", "site_admin": false }, "private": false, "html_url": "https://github.com/rails/rails", "description": "Ruby on Rails", "fork": false, "url": "https://api.github.com/repos/rails/rails", "forks_url": "https://api.github.com/repos/rails/rails/forks", "keys_url": "https://api.github.com/repos/rails/rails/keys{/key_id}", "collaborators_url": "https://api.github.com/repos/rails/rails/collaborators{/collaborator}", "teams_url": "https://api.github.com/repos/rails/rails/teams", "hooks_url": "https://api.github.com/repos/rails/rails/hooks", "issue_events_url": "https://api.github.com/repos/rails/rails/issues/events{/number}", "events_url": "https://api.github.com/repos/rails/rails/events", "assignees_url": "https://api.github.com/repos/rails/rails/assignees{/user}", "branches_url": "https://api.github.com/repos/rails/rails/branches{/branch}", "tags_url": "https://api.github.com/repos/rails/rails/tags", "blobs_url": "https://api.github.com/repos/rails/rails/git/blobs{/sha}", "git_tags_url": "https://api.github.com/repos/rails/rails/git/tags{/sha}", "git_refs_url": "https://api.github.com/repos/rails/rails/git/refs{/sha}", "trees_url": "https://api.github.com/repos/rails/rails/git/trees{/sha}", "statuses_url": "https://api.github.com/repos/rails/rails/statuses/{sha}", "languages_url": "https://api.github.com/repos/rails/rails/languages", "stargazers_url": "https://api.github.com/repos/rails/rails/stargazers", "contributors_url": "https://api.github.com/repos/rails/rails/contributors", "subscribers_url": "https://api.github.com/repos/rails/rails/subscribers", "subscription_url": "https://api.github.com/repos/rails/rails/subscription", "commits_url": "https://api.github.com/repos/rails/rails/commits{/sha}", "git_commits_url": "https://api.github.com/repos/rails/rails/git/commits{/sha}", "comments_url": "https://api.github.com/repos/rails/rails/comments{/number}", "issue_comment_url": "https://api.github.com/repos/rails/rails/issues/comments/{number}", "contents_url": "https://api.github.com/repos/rails/rails/contents/{+path}", "compare_url": "https://api.github.com/repos/rails/rails/compare/{base}...{head}", "merges_url": "https://api.github.com/repos/rails/rails/merges", "archive_url": "https://api.github.com/repos/rails/rails/{archive_format}{/ref}", "downloads_url": "https://api.github.com/repos/rails/rails/downloads", "issues_url": "https://api.github.com/repos/rails/rails/issues{/number}", "pulls_url": "https://api.github.com/repos/rails/rails/pulls{/number}", "milestones_url": "https://api.github.com/repos/rails/rails/milestones{/number}", "notifications_url": "https://api.github.com/repos/rails/rails/notifications{?since,all,participating}", "labels_url": "https://api.github.com/repos/rails/rails/labels{/name}", "releases_url": "https://api.github.com/repos/rails/rails/releases{/id}", "created_at": "2008-04-11T02:19:47Z", "updated_at": "2014-06-25T21:08:45Z", "pushed_at": "2014-06-25T17:47:52Z", "git_url": "git://github.com/rails/rails.git", "ssh_url": "git@github.com:rails/rails.git", "clone_url": "https://github.com/rails/rails.git", "svn_url": "https://github.com/rails/rails", "homepage": "http://rubyonrails.org", "size": 331047, "stargazers_count": 22248, "watchers_count": 22248, "language": "Ruby", "has_issues": true, "has_downloads": true, "has_wiki": false, "forks_count": 8278, "mirror_url": null, "open_issues_count": 625, "forks": 8278, "open_issues": 625, "watchers": 22248, "default_branch": "master", "organization": { "login": "rails", "id": 4223, "avatar_url": "https://avatars.githubusercontent.com/u/4223?", "gravatar_id": "30f39a09e233e8369dddf6feb4be0308", "url": "https://api.github.com/users/rails", "html_url": "https://github.com/rails", "followers_url": "https://api.github.com/users/rails/followers", "following_url": "https://api.github.com/users/rails/following{/other_user}", "gists_url": "https://api.github.com/users/rails/gists{/gist_id}", "starred_url": "https://api.github.com/users/rails/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rails/subscriptions", "organizations_url": "https://api.github.com/users/rails/orgs", "repos_url": "https://api.github.com/users/rails/repos", "events_url": "https://api.github.com/users/rails/events{/privacy}", "received_events_url": "https://api.github.com/users/rails/received_events", "type": "Organization", "site_admin": false }, "network_count": 8278, "subscribers_count": 1521 } ================================================ FILE: spec/parser_spec.rb ================================================ require 'json/stream' require 'minitest/autorun' describe JSON::Stream::Parser do subject { JSON::Stream::Parser.new } describe 'parsing a document' do it 'rejects documents containing bad start character' do expected = [:error] assert_equal expected, events('a') end it 'rejects documents starting with period' do expected = [:error] assert_equal expected, events('.') end it 'parses a null value document' do expected = [:start_document, [:value, nil], :end_document] assert_equal expected, events('null') end it 'parses a false value document' do expected = [:start_document, [:value, false], :end_document] assert_equal expected, events('false') end it 'parses a true value document' do expected = [:start_document, [:value, true], :end_document] assert_equal expected, events('true') end it 'parses a string document' do expected = [:start_document, [:value, "test"], :end_document] assert_equal expected, events('"test"') end it 'parses a single digit integer value document' do expected = [:start_document, [:value, 2], :end_document] events = events('2', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses a multiple digit integer value document' do expected = [:start_document, [:value, 12], :end_document] events = events('12', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses a zero literal document' do expected = [:start_document, [:value, 0], :end_document] events = events('0', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses a negative integer document' do expected = [:start_document, [:value, -1], :end_document] events = events('-1', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses an exponent literal document' do expected = [:start_document, [:value, 200.0], :end_document] events = events('2e2', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses a float value document' do expected = [:start_document, [:value, 12.1], :end_document] events = events('12.1', subject) assert events.empty? subject.finish assert_equal expected, events end it 'parses a value document with leading whitespace' do expected = [:start_document, [:value, false], :end_document] assert_equal expected, events(' false ') end it 'parses array documents' do expected = [:start_document, :start_array, :end_array, :end_document] assert_equal expected, events('[]') assert_equal expected, events('[ ]') assert_equal expected, events(' [] ') assert_equal expected, events(' [ ] ') end it 'parses object documents' do expected = [:start_document, :start_object, :end_object, :end_document] assert_equal expected, events('{}') assert_equal expected, events('{ }') assert_equal expected, events(' {} ') assert_equal expected, events(' { } ') end it 'rejects documents with trailing characters' do expected = [:start_document, :start_object, :end_object, :end_document, :error] assert_equal expected, events('{}a') assert_equal expected, events('{ } 12') assert_equal expected, events(' {} false') assert_equal expected, events(' { }, {}') end it 'ignores whitespace around tokens, preserves it within strings' do json = %Q{ { " key 1 " : \t [ 1, 2, " my string ",\r false, true, null ] } } expected = [ :start_document, :start_object, [:key, " key 1 "], :start_array, [:value, 1], [:value, 2], [:value, " my string "], [:value, false], [:value, true], [:value, nil], :end_array, :end_object, :end_document ] assert_equal expected, events(json) end it 'rejects form feed whitespace' do json = "[1,\f 2]" expected = [:start_document, :start_array, [:value, 1], :error] assert_equal expected, events(json) end it 'rejects vertical tab whitespace' do json = "[1,\v 2]" expected = [:start_document, :start_array, [:value, 1], :error] assert_equal expected, events(json) end it 'rejects partial keyword tokens' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[tru]') assert_equal expected, events('[fal]') assert_equal expected, events('[nul,true]') assert_equal expected, events('[fals1]') end it 'rejects scrambled keyword tokens' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[ture]') assert_equal expected, events('[fales]') assert_equal expected, events('[nlul]') end it 'parses single keyword tokens' do expected = [:start_document, :start_array, [:value, true], :end_array, :end_document] assert_equal expected, events('[true]') end it 'parses keywords in series' do expected = [:start_document, :start_array, [:value, true], [:value, nil], :end_array, :end_document] assert_equal expected, events('[true, null]') end end describe 'finishing the parse' do it 'rejects finish with no json data provided' do assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial null keyword' do subject << 'nul' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial true keyword' do subject << 'tru' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial false keyword' do subject << 'fals' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial float literal' do subject << '42.' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial exponent' do subject << '42e' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects malformed exponent' do subject << '42e+' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial negative number' do subject << '-' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial string literal' do subject << '"test' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial object ending in literal value' do subject << '{"test": 42' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'rejects partial array ending in literal value' do subject << '[42' assert_raises(JSON::Stream::ParserError) { subject.finish } end it 'does nothing on subsequent finish' do begin subject << 'false' subject.finish subject.finish rescue fail 'raised unexpected error' end end end describe 'parsing number tokens' do it 'rejects invalid negative numbers' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[-]') expected = [:start_document, :start_array, [:value, 1], :error] assert_equal expected, events('[1-0]') end it 'parses integer zero' do expected = [:start_document, :start_array, [:value, 0], :end_array, :end_document] assert_equal expected, events('[0]') assert_equal expected, events('[-0]') end it 'parses float zero' do expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document] assert_equal expected, events('[0.0]') assert_equal expected, events('[-0.0]') end it 'rejects multi zero' do expected = [:start_document, :start_array, [:value, 0], :error] assert_equal expected, events('[00]') assert_equal expected, events('[-00]') end it 'rejects integers that start with zero' do expected = [:start_document, :start_array, [:value, 0], :error] assert_equal expected, events('[01]') assert_equal expected, events('[-01]') end it 'parses integer tokens' do expected = [:start_document, :start_array, [:value, 1], :end_array, :end_document] assert_equal expected, events('[1]') expected = [:start_document, :start_array, [:value, -1], :end_array, :end_document] assert_equal expected, events('[-1]') expected = [:start_document, :start_array, [:value, 123], :end_array, :end_document] assert_equal expected, events('[123]') expected = [:start_document, :start_array, [:value, -123], :end_array, :end_document] assert_equal expected, events('[-123]') end it 'parses float tokens' do expected = [:start_document, :start_array, [:value, 1.0], :end_array, :end_document] assert_equal expected, events('[1.0]') assert_equal expected, events('[1.00]') end it 'parses negative floats' do expected = [:start_document, :start_array, [:value, -1.0], :end_array, :end_document] assert_equal expected, events('[-1.0]') assert_equal expected, events('[-1.00]') end it 'parses multi-digit floats' do expected = [:start_document, :start_array, [:value, 123.012], :end_array, :end_document] assert_equal expected, events('[123.012]') assert_equal expected, events('[123.0120]') end it 'parses negative multi-digit floats' do expected = [:start_document, :start_array, [:value, -123.012], :end_array, :end_document] assert_equal expected, events('[-123.012]') assert_equal expected, events('[-123.0120]') end it 'rejects floats missing leading zero' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[.1]') assert_equal expected, events('[-.1]') assert_equal expected, events('[.01]') assert_equal expected, events('[-.01]') end it 'rejects float missing fraction' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[.]') assert_equal expected, events('[..]') assert_equal expected, events('[0.]') assert_equal expected, events('[12.]') end it 'parses zero with implicit positive exponent as float' do expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document] events = events('[0e2]') assert_equal expected, events assert_kind_of Float, events[2][1] end it 'parses zero with explicit positive exponent as float' do expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document] events = events('[0e+2]') assert_equal expected, events assert_kind_of Float, events[2][1] end it 'parses zero with negative exponent as float' do expected = [:start_document, :start_array, [:value, 0.0], :end_array, :end_document] events = events('[0e-2]') assert_equal expected, events assert_kind_of Float, events[2][1] end it 'parses positive exponent integers as floats' do expected = [:start_document, :start_array, [:value, 212.0], :end_array, :end_document] events = events('[2.12e2]') assert_equal expected, events('[2.12e2]') assert_kind_of Float, events[2][1] assert_equal expected, events('[2.12e02]') assert_equal expected, events('[2.12e+2]') assert_equal expected, events('[2.12e+02]') end it 'parses positive exponent floats' do expected = [:start_document, :start_array, [:value, 21.2], :end_array, :end_document] assert_equal expected, events('[2.12e1]') assert_equal expected, events('[2.12e01]') assert_equal expected, events('[2.12e+1]') assert_equal expected, events('[2.12e+01]') end it 'parses negative exponent' do expected = [:start_document, :start_array, [:value, 0.0212], :end_array, :end_document] assert_equal expected, events('[2.12e-2]') assert_equal expected, events('[2.12e-02]') assert_equal expected, events('[2.12e-2]') assert_equal expected, events('[2.12e-02]') end it 'parses zero exponent floats' do expected = [:start_document, :start_array, [:value, 2.12], :end_array, :end_document] assert_equal expected, events('[2.12e0]') assert_equal expected, events('[2.12e00]') assert_equal expected, events('[2.12e-0]') assert_equal expected, events('[2.12e-00]') end it 'parses zero exponent integers' do expected = [:start_document, :start_array, [:value, 2.0], :end_array, :end_document] assert_equal expected, events('[2e0]') assert_equal expected, events('[2e00]') assert_equal expected, events('[2e-0]') assert_equal expected, events('[2e-00]') end it 'rejects missing exponent' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[e]') assert_equal expected, events('[1e]') assert_equal expected, events('[1e-]') assert_equal expected, events('[1e--]') assert_equal expected, events('[1e+]') assert_equal expected, events('[1e++]') assert_equal expected, events('[0.e]') assert_equal expected, events('[10.e]') end it 'rejects float with trailing character' do expected = [:start_document, :start_array, [:value, 0.0], :error] assert_equal expected, events('[0.0q]') end it 'rejects integer with trailing character' do expected = [:start_document, :start_array, [:value, 1], :error] assert_equal expected, events('[1q]') end end describe 'parsing string tokens' do describe 'parsing two-character escapes' do it 'rejects invalid escape characters' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\\a"]') end it 'parses quotation mark' do expected = [:start_document, :start_array, [:value, "\""], :end_array, :end_document] assert_equal expected, events('["\""]') end it 'parses reverse solidus' do expected = [:start_document, :start_array, [:value, "\\"], :end_array, :end_document] assert_equal expected, events('["\\\"]') end it 'parses solidus' do expected = [:start_document, :start_array, [:value, "/"], :end_array, :end_document] assert_equal expected, events('["\/"]') end it 'parses backspace' do expected = [:start_document, :start_array, [:value, "\b"], :end_array, :end_document] assert_equal expected, events('["\b"]') end it 'parses form feed' do expected = [:start_document, :start_array, [:value, "\f"], :end_array, :end_document] assert_equal expected, events('["\f"]') end it 'parses line feed' do expected = [:start_document, :start_array, [:value, "\n"], :end_array, :end_document] assert_equal expected, events('["\n"]') end it 'parses carriage return' do expected = [:start_document, :start_array, [:value, "\r"], :end_array, :end_document] assert_equal expected, events('["\r"]') end it 'parses tab' do expected = [:start_document, :start_array, [:value, "\t"], :end_array, :end_document] assert_equal expected, events('["\t"]') end it 'parses a series of escapes with whitespace' do expected = [:start_document, :start_array, [:value, "\" \\ / \b \f \n \r \t"], :end_array, :end_document] assert_equal expected, events('["\" \\\ \/ \b \f \n \r \t"]') end it 'parses a series of escapes without whitespace' do expected = [:start_document, :start_array, [:value, "\"\\/\b\f\n\r\t"], :end_array, :end_document] assert_equal expected, events('["\"\\\\/\b\f\n\r\t"]') end it 'parses a series of escapes with duplicate characters between them' do expected = [:start_document, :start_array, [:value, "\"t\\b/f\bn\f/\nn\rr\t"], :end_array, :end_document] assert_equal expected, events('["\"t\\\b\/f\bn\f/\nn\rr\t"]') end end describe 'parsing control characters' do it 'rejects control character in array' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\" \u0000 \"]") end it 'rejects control character in object' do expected = [:start_document, :start_object, :error] assert_equal expected, events("{\" \u0000 \":12}") end it 'parses escaped control character' do expected = [:start_document, :start_array, [:value, "\u0000"], :end_array, :end_document] assert_equal expected, events('["\\u0000"]') end it 'parses escaped control character in object key' do expected = [:start_document, :start_object, [:key, "\u0000"], [:value, 12], :end_object, :end_document] assert_equal expected, events('{"\\u0000": 12}') end it 'parses non-control character' do # del ascii 127 is allowed unescaped in json expected = [:start_document, :start_array, [:value, " \u007F "], :end_array, :end_document] assert_equal expected, events("[\" \u007f \"]") end end describe 'parsing unicode escape sequences' do it 'parses escaped ascii character' do a = "\x61" escaped = '\u0061' expected = [:start_document, :start_array, [:value, a], :end_array, :end_document] assert_equal expected, events('["' + escaped + '"]') end it 'parses un-escaped raw unicode' do # U+1F602 face with tears of joy face = "\xf0\x9f\x98\x82" expected = [:start_document, :start_array, [:value, face], :end_array, :end_document] assert_equal expected, events('["' + face + '"]') end it 'parses escaped unicode surrogate pairs' do # U+1F602 face with tears of joy face = "\xf0\x9f\x98\x82" escaped = '\uD83D\uDE02' expected = [:start_document, :start_array, [:value, face], :end_array, :end_document] assert_equal expected, events('["' + escaped + '"]') end it 'rejects partial unicode escapes' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[" \\u "]') assert_equal expected, events('[" \\u2 "]') assert_equal expected, events('[" \\u26 "]') assert_equal expected, events('[" \\u260 "]') end it 'parses unicode escapes' do # U+2603 snowman snowman = "\xe2\x98\x83" escaped = '\u2603' expected = [:start_document, :start_array, [:value, snowman], :end_array, :end_document] assert_equal expected, events('["' + escaped + '"]') expected = [:start_document, :start_array, [:value, 'snow' + snowman + ' man'], :end_array, :end_document] assert_equal expected, events('["snow' + escaped + ' man"]') expected = [:start_document, :start_array, [:value, 'snow' + snowman + '3 man'], :end_array, :end_document] assert_equal expected, events('["snow' + escaped + '3 man"]') expected = [:start_document, :start_object, [:key, 'snow' + snowman + '3 man'], [:value, 1], :end_object, :end_document] assert_equal expected, events('{"snow\\u26033 man": 1}') end end describe 'parsing unicode escapes with surrogate pairs' do it 'rejects missing second pair' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\uD834"]') end it 'rejects missing first pair' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\uDD1E"]') end it 'rejects double first pair' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\uD834\uD834"]') end it 'rejects double second pair' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\uDD1E\uDD1E"]') end it 'rejects reversed pair' do expected = [:start_document, :start_array, :error] assert_equal expected, events('["\uDD1E\uD834"]') end it 'parses correct pairs in object keys and values' do # U+1D11E G-Clef clef = "\xf0\x9d\x84\x9e" expected = [ :start_document, :start_object, [:key, clef], [:value, "g\u{1D11E}clef"], :end_object, :end_document ] assert_equal expected, events(%q{ {"\uD834\uDD1E": "g\uD834\uDD1Eclef"} }) end end end describe 'parsing arrays' do it 'rejects trailing comma' do expected = [:start_document, :start_array, [:value, 12], :error] assert_equal expected, events('[12, ]') end it 'parses nested empty array' do expected = [:start_document, :start_array, :start_array, :end_array, :end_array, :end_document] assert_equal expected, events('[[]]') end it 'parses nested array with value' do expected = [:start_document, :start_array, :start_array, [:value, 2.1], :end_array, :end_array, :end_document] assert_equal expected, events('[[ 2.10 ]]') end it 'rejects malformed arrays' do expected = [:start_document, :start_array, :error] assert_equal expected, events('[}') assert_equal expected, events('[,]') assert_equal expected, events('[, 12]') end it 'rejects malformed nested arrays' do expected = [:start_document, :start_array, :start_array, :error] assert_equal(expected, events('[[}]')) assert_equal expected, events('[[}]') assert_equal expected, events('[[,]]') end it 'rejects malformed array value lists' do expected = [:start_document, :start_array, [:value, "test"], :error] assert_equal expected, events('["test"}') assert_equal expected, events('["test",]') assert_equal expected, events('["test" "test"]') assert_equal expected, events('["test" 12]') end it 'parses array with value' do expected = [:start_document, :start_array, [:value, "test"], :end_array, :end_document] assert_equal expected, events('["test"]') end it 'parses array with value list' do expected = [ :start_document, :start_array, [:value, 1], [:value, 2], [:value, nil], [:value, 12.1], [:value, "test"], :end_array, :end_document ] assert_equal expected, events('[1,2, null, 12.1,"test"]') end end describe 'parsing objects' do it 'rejects malformed objects' do expected = [:start_document, :start_object, :error] assert_equal expected, events('{]') assert_equal expected, events('{:}') end it 'parses single key object' do expected = [:start_document, :start_object, [:key, "key 1"], [:value, 12], :end_object, :end_document] assert_equal expected, events('{"key 1" : 12}') end it 'parses object key value list' do expected = [ :start_document, :start_object, [:key, "key 1"], [:value, 12], [:key, "key 2"], [:value, "two"], :end_object, :end_document ] assert_equal expected, events('{"key 1" : 12, "key 2":"two"}') end it 'rejects object key with no value' do expected = [ :start_document, :start_object, [:key, "key"], :start_array, [:value, nil], [:value, false], [:value, true], :end_array, [:key, "key 2"], :error ] assert_equal expected, events('{"key": [ null , false , true ] ,"key 2"}') end it 'rejects object with trailing comma' do expected = [:start_document, :start_object, [:key, "key 1"], [:value, 12], :error] assert_equal expected, events('{"key 1" : 12,}') end end describe 'parsing unicode bytes' do it 'parses single byte utf-8' do expected = [:start_document, :start_array, [:value, "test"], :end_array, :end_document] assert_equal expected, events('["test"]') end it 'parses full two byte utf-8' do expected = [ :start_document, :start_array, [:value, "résumé"], [:value, "éé"], :end_array, :end_document ] assert_equal expected, events("[\"résumé\", \"é\xC3\xA9\"]") end # Parser should throw an error when only one byte of a two byte character # is available. The \xC3 byte is the first byte of the é character. it 'rejects a partial two byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xC3\"]") end it 'parses valid two byte utf-8 string' do expected = [:start_document, :start_array, [:value, 'é'], :end_array, :end_document] assert_equal expected, events("[\"\xC3\xA9\"]") end it 'parses full three byte utf-8 string' do expected = [ :start_document, :start_array, [:value, "snow\u2603man"], [:value, "\u2603\u2603"], :end_array, :end_document ] assert_equal expected, events("[\"snow\u2603man\", \"\u2603\u2603\"]") end it 'rejects one byte of three byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xE2\"]") end it 'rejects two bytes of three byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xE2\x98\"]") end it 'parses full three byte utf-8 string' do expected = [:start_document, :start_array, [:value, "\u2603"], :end_array, :end_document] assert_equal expected, events("[\"\xE2\x98\x83\"]") end it 'parses full four byte utf-8 string' do expected = [ :start_document, :start_array, [:value, "\u{10102} check mark"], :end_array, :end_document ] assert_equal expected, events("[\"\u{10102} check mark\"]") end it 'rejects one byte of four byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xF0\"]") end it 'rejects two bytes of four byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xF0\x90\"]") end it 'rejects three bytes of four byte utf-8 string' do expected = [:start_document, :start_array, :error] assert_equal expected, events("[\"\xF0\x90\x84\"]") end it 'parses full four byte utf-8 string' do expected = [:start_document, :start_array, [:value, "\u{10102}"], :end_array, :end_document] assert_equal expected, events("[\"\xF0\x90\x84\x82\"]") end end describe 'parsing json text from the module' do it 'parses an array document' do result = JSON::Stream::Parser.parse('[1,2,3]') assert_equal [1, 2, 3], result end it 'parses a true keyword literal document' do result = JSON::Stream::Parser.parse('true') assert_equal true, result end it 'parses a false keyword literal document' do result = JSON::Stream::Parser.parse('false') assert_equal false, result end it 'parses a null keyword literal document' do result = JSON::Stream::Parser.parse('null') assert_nil result end it 'parses a string literal document' do result = JSON::Stream::Parser.parse('"hello"') assert_equal 'hello', result end it 'parses an integer literal document' do result = JSON::Stream::Parser.parse('42') assert_equal 42, result end it 'parses a float literal document' do result = JSON::Stream::Parser.parse('42.12') assert_equal 42.12, result end it 'rejects a partial float literal document' do assert_raises(JSON::Stream::ParserError) do JSON::Stream::Parser.parse('42.') end end it 'rejects a partial document' do assert_raises(JSON::Stream::ParserError) do JSON::Stream::Parser.parse('{') end end it 'rejects an empty document' do assert_raises(JSON::Stream::ParserError) do JSON::Stream::Parser.parse('') end end end it 'registers observers in initializer block' do events = [] parser = JSON::Stream::Parser.new do start_document { events << :start_document } end_document { events << :end_document } start_object { events << :start_object } end_object { events << :end_object } key { |k| events << [:key, k] } value { |v| events << [:value, v] } end parser << '{"key":12}' expected = [:start_document, :start_object, [:key, "key"], [:value, 12], :end_object, :end_document] assert_equal expected, events end private # Run a worst case, one byte at a time, parse against the JSON string and # return a list of events generated by the parser. A special :error event is # included if the parser threw an exception. # # json - The String to parse. # parser - The optional Parser instance to use. # # Returns an Events instance. def events(json, parser = nil) parser ||= JSON::Stream::Parser.new collector = Events.new(parser) begin json.each_byte { |byte| parser << [byte].pack('C') } rescue JSON::Stream::ParserError collector.error end collector.events end # Dynamically map methods in this class to parser callback methods # so we can collect parser events for inspection by test cases. class Events METHODS = %w[start_document end_document start_object end_object start_array end_array key value] attr_reader :events def initialize(parser) @events = [] METHODS.each do |name| parser.send(name, &method(name)) end end METHODS.each do |name| define_method(name) do |*args| @events << (args.empty? ? name.to_sym : [name.to_sym, *args]) end end def error @events << :error end end end